Random Variables and Discrete Distributions

introduced the sample sum of random draws with replacement from a box of tickets, each of which is labeled "0" or "1." The sample sum is a random variable, and its probability distribution, the binomial distribution, is a discrete probability distribution. This chapter introduces several other random variables and probability distributions that arise from drawing at random from a box of tickets numbered "0" or "1:" the geometric distribution, the negative binomial distribution, and the hypergeometric distribution.

Random Variables

A random variable is an assignment of numbers to outcomes of a random experiment. For example, consider the experiment of drawing tickets at random independently from a box of numbered tickets. The possible outcomes of n such draws are sequences of n tickets in a particular order. The sample sum of the numbers on the tickets drawn, which was introduced in the previous chapter, is a random variable. Depending on the sequence of tickets drawn, the sample sum takes different values. To any particular sequence of n tickets, the sample sum assigns the sum of their labels. The sample sum does not distinguish between sequences of tickets in which other tickets with the same labels were drawn, nor in which the tickets were drawn in a different order: The sample sum depends only on the set of labels on the tickets drawn.

In this book, upper-case Roman letters are used to denote random variables; for example, X might stand for the sample sum of 5 independent random draws with replacement from a box containing one ticket labeled "1" and one ticket labeled "0." In that case, the random variable X would have a binomial distribution with parameters n = 5 and p = 50%. A random variable is not a variable in the same sense the word is used in calculus or algebra: It is something that takes a random value, depending on the outcome of a random experiment.

The probability distribution of a (discrete) random variable gives the chances that the random variable takes each of its possible values. That is, for every value x that the random variable X can take, the probability distribution of X gives P(X=x). The probability distribution can be written either as a formula (a function of x), or a table. For example, in the case just described, we might write the probability distribution of X as

P(X=x) = 5Cx × (1/2)5, x=0, 1, 2, 3, 4, 5,

or as the following table:

Probability distribution of X
x P(X = x)
0 1/32
1 5/32
2 10/32
3 10/32
4 5/32
5 1/32

We proceed to investigate some discrete probability distributions that arise frequently in practical problems.

Sampling from 0-1 Boxes

Recall that a box of numbered tickets is a 0-1 box if each ticket in the box is numbered either "0" or "1." Suppose that a 0-1 box contains N tickets in all, of which G are labeled "1" and the remaining NG are labeled "0," so that the fraction of tickets in the box labeled "1" is p=G/N. shows that the sample sum of n random draws with replacement from that box has a binomial distribution with parameters n and p. Many other interesting random variables arise in sampling from 0-1 boxes. This section presents three: the number of random draws with replacement until the first time a ticket labeled "1" is drawn (which has a geometric distribution), the number of random draws with replacement until the rth time a ticket labeled "1" is drawn (which has a negative binomial distribution), and the sample sum of n random draws without replacement (which has an hypergeometric distribution).

The Geometric Distribution

The number of random draws with replacement from a 0-1 box until the first time a ticket labeled "1" is drawn is a random variable with a geometric distribution with parameter p=G/N, where G is the number of tickets labeled "1" in the box and N is the total number of tickets in the box. This subsection finds the geometric distribution.

Consider the number of times one must roll a fair die up to and including the first time the die lands with one spot on top. A box model for this experiment would be to draw tickets at random with replacement from a 0-1 box that contains N=6 tickets of which G=1 are labeled "1" and NG=5 are labeled "0," until the first time the ticket labeled "1" is drawn.

Box for a fair die to land showing 1 spot

0

0

0

0

0

1

Let X denote the number of draws until the first time a ticket labeled "1" is drawn, including the draw that yields the ticket labeled "1." The random variable X does not have a binomial distribution: The number of draws is not fixed in advance, and the quantity of interest is the number of draws, not the sample sum of the numbers on the tickets drawn.

Let us calculate the probability distribution of X. It is impossible for X to be less than one: It takes at least one draw to get the ticket labeled "1." The chance that it takes exactly one draw to get the ticket labeled "1"—the chance that X=1—is 1/6. What is the chance that X=2? For X=2 to occur, the first draw must give a ticket labeled "0" and the second draw must give the ticket labeled "1." Because the draws are independent, the chance that these two things occur (draw "0" the first time and "1" the second time) equals the product of their individual unconditional chances. The chance of drawing "0" the first time is 5/6, and the chance of drawing "1" the second time is 1/6, so the chance that it takes two draws is 5/6×1/6 = 5/36.

What is the chance that X=3? The event X=3 occurs if the first draw gives "0," the second draw gives "0," and the third draw gives "1." Because the draws are independent, the event X=3 has chance 5/6×5/6 ×1/6=25/216.

For X to equal x, we first need to have x−1 draws that give a ticket labeled "0" (these each have probability 5/6), then a draw that gives the ticket labeled "1" (which has probability 1/6). Because the draws are independent, we can multiply the chances of these outcomes to determine the chance that they all happen together (the chance of their intersection). Thus the general formula is

P(X = x) = (5/6)x−1 × (1/6), x = 1, 2, 3, … .

Let X be the number of trials to the first success in independent trials with the same probability p of success. Then X behaves like the number of times one needs to draw tickets at random with replacement from a 0-1 box that contains a proportion p=G/N of tickets labeled "1." X is a random variable with a geometric distribution with parameter p, and

P(X = x) = (1 − p)x−1 × p, for x = 1, 2, 3, … .

The die-rolling example is the special case p = 1/6. Note that a random variable that has the geometric distribution has an infinite (but countable) number of possible values: all the positive integers. The sum of the chances that a random variable with the geometric distribution takes the values 0, 1, 2, 3, … , is 100%:

P(X=1) + P(X=2) + P(X=3) +  … = 100%.

The essential requirement for a random variable to have the geometric distribution is that it counts the number of trials to the first success in independent trials with the same probability p of success in each trial. If we identify "success" with drawing a ticket labeled "1" from a 0-1 box, and identify "failure" with drawing a ticket labeled "0" from the box, each draw can be thought of as a trial that results in success or failure. If we draw at random independently with replacement, the trials are independent and have the same probability p of success (the proportion p=G/N of tickets labeled "1" does not change from draw to draw). If a random variable is not equivalent to the number of trials until the first success in independent trials with the same probability of success, the random variable does not have a geometric distribution.

Suppose I toss a fair coin over and over until it lands heads. What is the chance I toss the coin fewer than 5 times?

Solution. A box model for the experiment is to draw at random with replacement from a 0-1 box containing one ticket labeled "1" and one ticket labeled "0" until the first time the ticket labeled "1" is drawn. We seek the probability of drawing the ticket labeled "1" for the first time on the first, second, third, or fourth draw. In each draw, the chance of getting the ticket labeled "1" and the chance of getting the ticket labeled "0" are both 50%. The chance "1" is drawn on the first draw is 50%. The chance "1" is drawn for the first time on the second draw is 1/2 ×1/2 = 25%. The chance "1" is drawn for the first time on the third draw is 1/2 ×1/2 ×1/2 = 12.5%. The chance "1" is drawn for the first time on the fourth draw is 1/2 ×1/2 ×1/2 ×1/2 = 6.25%. These events are mutually exclusive, so the chance "1" is drawn for the first time before the fifth draw (the chance of the union of the events) is the sum of their chances:

50% + 25% + 12.5% + 6.25% = 93.75%.

That also must be the chance that I have to toss a fair coin fewer than five times before it lands heads. We could also find the probability more easily using the Complement Rule: The chance that it takes fewer than five tosses is 100% minus the chance that it takes at least five tosses. The chance that it takes at least five tosses is the chance that the first four tosses all yield tails, which is 1/2×1/2×1/2 ×1/2 = 1/16 = 6.25%, because the tosses are independent. The chance that the first head occurs before the fifth toss is thus 100%−6.25%=93.75%.

The following exercise checks your ability to compute probabilities using the geometric distribution.

Negative Binomial Distribution

In random draws with replacement from a 0-1 box that contains a fraction p=G/N of tickets labeled "1," the number of draws needed to get a ticket labeled "1" for the rth time has a negative binomial distribution with parameters p and r. (When r=1, the negative binomial distribution is the same as the geometric distribution—the distribution of the number of draws until a ticket labeled "1" is drawn for the first time.) Suppose X is a random variable with the negative binomial distribution with parameters p and r. What is P(X = k)—the chance that it takes k draws to get a ticket labeled "1" the rth time?

Clearly, it takes at least r draws, so if k < r, P(X = k) = 0. Consider, therefore, the chance for kr. How can the kth draw give a ticket labeled "1" for the rth time? The first k−1 draws must give a ticket labeled "1" exactly r−1 times and a ticket labeled "0" (k−1)−(r−1) = kr times, and the kth draw must give a ticket labeled "1." We already know how to find the probability that k−1 independent draws from a 0-1 box give a ticket labeled "1" r−1 times: The number of tickets labeled "1" in k−1 independent random draws with replacement from a 0-1 box has a binomial distribution with parameters k−1 and p, so the chance of getting a ticket labeled "1" r−1 times in k−1 draws is

k−1Cr−1×pr−1 × (1−p)(k−1)−(r−1) = k−1Cr−1×pr−1 ×(1−p)kr.

The chance that the kth draw gives a ticket labeled "1" is p, and because the draws are all independent, the chance that the first k−1 draws give exactly r−1 tickets labeled "1" and the kth draw also gives a ticket labeled "1" is the product of the probability that the first k−1 draws give r−1 tickets labeled "1" and the probability that the kth draw gives a ticket labeled "1;" namely,

k−1Cr−1 × pr−1 ×(1−p)kr × p = k−1Cr−1 × pr ×(1−p)kr,

for k = r, r + 1, r + 2, … ; and 0 if k < r.

This is the negative binomial probability distribution with parameters p and r.

The essential requirement for a random variable to have the negative binomial distribution is that it count the number of trials to the rth success in independent trials with the same probability p of success in each trial. As described earlier in this chapter, if we identify success with drawing a ticket labeled "1" from a 0-1 box and identify failure with drawing a ticket labeled "0" from the box, each draw can be thought of as a trial that results in success or failure. If we draw at random with replacement, the trials are independent and have the same probability p of success—the proportion of tickets labeled "1" does not change from draw to draw. If a random variable is not equivalent to the number of trials it takes to get the rth success in independent trials with the same probability p of success in each trial, it does not have a negative binomial distribution. applies the negative binomial distribution.

I shall toss a fair coin repeatedly (and independently) until the coin lands heads the sixth time.

  1. What is the chance that it takes exactly ten tosses?
  2. What is the chance that it takes more than 7 tosses?

Solution.

  1. The number of tosses it takes to get six heads is the number of trials to the sixth success in independent trials each of which has probability p=50% of success. It therefore has the negative binomial distribution with parameters p = 50% and r = 6, so the chance that it takes exactly 10 tosses is:

    9C5 × 0.56 × (1−0.5)4 = 126 × 0.510 = 12.30%.

  2. Notice that it takes at least 6 tosses to get heads for the sixth time; applying the Complement Rule, we see that the chance we seek is:

    100%−(chance it takes 7 or fewer tosses) = 100% − (chance it takes 6 tosses or 7 tosses)

    = 100% − [(chance it takes exactly 6 tosses) + (chance it takes exactly 7 tosses)]

    = 100% − (chance it takes exactly 6 tosses) − (chance it takes exactly 7 tosses)

    = 100% − 6C6 × 0.56 × (1−0.5)06C5 × 0.56 × (1−0.5)1

    = 100% − 0.56 − 6×0.57

    = 93.75%.

The following exercise checks your ability to calculate with the negative binomial distribution.

The Hypergeometric Distribution

The sample sum of the labels on the tickets in a random sample without replacement from a 0-1 box has the hypergeometric distribution. A random sample without replacement of size n from a population of N units, also called a simple random sample of size n from a population of N units, is a sample drawn in such a way that every subset of n of the N units in the population is equally likely to be the sample. Equivalently, the sample has a uniform distribution on the NCn subsets of n of the N tickets. One can think of a simple random sample as being drawn in n steps: A unit is drawn at random from all N units in the population, then a unit is drawn at random from the remaining N−1 units, and so on. In the last step, a unit is drawn at random from the remaining Nn+1 units. In each step, the draw is equally likely to yield each remaining unit.

Consider drawing a simple random sample of size n from a 0-1 box of N tickets of which G are labeled "1" and the other NG are labeled "0." What is the chance that the sample sum of the labels on the tickets in the sample will equal k? Equivalently, what is the chance that exactly k of the n tickets in the sample will be labeled "1" (and the remaining nk will be labeled "0")?

We can solve this problem using the Fundamental Rule of Counting, and what we know about combinations. Let X be the (random) number of tickets labeled "1" in the simple random sample of size n. Clearly, X cannot exceed the smaller of n and G (there cannot be more tickets labeled "1" in the sample than there are tickets in the sample, and there cannot be more tickets labeled "1" in the sample than there are in the box, because the sample is drawn without replacement). Furthermore, X must be at least the larger of zero and n − (NG): We cannot get a negative number of tickets labeled "1," and there are only NG tickets labeled "0" in the population, so if the sample is larger than NG, it must contain some of the tickets labeled "1."

Subject to these restrictions, how can the sample contain k tickets labeled "1?" We can think of drawing such a sample as a sequence of two experiments: drawing k tickets labeled "1" from the population of G tickets labeled "1," then drawing nk tickets labeled "0" from the population of NG tickets labeled "0," resulting in a subset of size n from the population of size N. By the Fundamental Rule of Counting, there are

GCk × NGCnk

ways of doing this (recall that GCk is the number of combinations of k of a set of G things). The total number of possible outcomes of drawing n tickets without replacement is the number of subsets of size n one can form from the population of N tickets, which is NCn. By assumption, our method of drawing samples makes each of these equally likely. The chance that the simple random sample contains exactly k tickets labeled "1" is thus

GCk × NGCnk
--------------------- ,
NCn

for max(0, n − (NG)) ≤ k ≤ min(n, G). (In case the notation is unfamiliar, max(a, b) is the larger of a and b; min(a, b) is the smaller of a and b.) The chance is 0 if k is outside that range of possibilities.

This is the hypergeometric probability distribution with parameters N, G, and n: It gives for each k the chance that the sample sum of the labels on the tickets equals k, for a simple random sample (a random sample without replacement) of size n from a box of N tickets of which G are labeled "1" and the rest are labeled "0."

Sometimes it is convenient to think of the draws as being from a population of N objects of which G are "good" and NG are "bad." The number of good objects in a simple random sample of size n from such a population has the hypergeometric probability distribution with parameters N, G, and n.

Suppose a group of 1000 voters contains 5 members of the Libertarian party. We shall take a simple random sample of 100 voters from this group.

Solution.

The following exercise checks your ability to compute probabilities using the hypergeometric distribution.

When the number of tickets in the box, N, is very large compared with the number of tickets drawn from the box, n; the number of tickets labeled "1" in the box, G, is large compared with the number of tickets labeled "1" drawn from the box, k, and the number of tickets labeled "0" in the box, N−G, is large compared with the number of tickets labeled "0" drawn from the box, nk, there is not much difference between drawing with replacement and drawing without replacement. Then the hypergeometric distribution and the binomial are close.

Calculating Binomial, Geometric, Hypergeometric, and Negative Binomial Probabilities

is a calculator to find the chance that a random variable with a binomial, geometric, negative binomial, or hypergeometric distribution falls in any range. To use the tool, select the distribution you want from the drop-down menu. The display then prompts for the parameters of the distribution, and for the range of values of which to find the probability.

Select the distribution from the drop-down menu. Set the parameter values in the text boxes. Use the checkboxes and the and text boxes to select the range for which you seek the probability.

Seventeen tickets will be drawn at random without replacement from a 0-1 box containing 50 tickets of which 20 are labeled "1." What is the probability that the number of tickets labeled "1" in the sample is between 5 and 15, inclusive?

Solution. The number of tickets labeled "1" in the sample is the sample sum of a simple random sample of size n=17 from a 0-1 box containing N=50 tickets of which G=20 are labeled "1," so that it has an hypergeometric distribution with parameters N=50, G=20, and n=17. To find the probability, in select "Hypergeometric" in the top drop-down menu. Set N to 50, G to 20, and n to 17. Tic the box that precedes X≥, type 5 into the box that follows X≥, tic the box that precedes X≤, and type 15 into the box that follows X≤. The figure should show that the probability sought is 92.094%

Discrete Distributions

The binomial, geometric, hypergeometric, and negative binomial distributions are examples of discrete probability distributions. For discrete probability distributions, the number of values that have nonzero probability is countable. The binomial probability distribution assigns positive probability to the integers {0, 1, …, n}. The geometric distribution assigns positive probability to the positive integers {1, 2, 3, … }. The hypergeometric distribution assigns positive probability to the integers between

max{0, sample size − (population size − #good in population)} and

min{sample size, #good in population}.

All these sets are countable.

There are many other discrete distributions. Some have names; some do not. presents a random variable with a discrete distribution that has no name (as far as I am aware).

Toss a fair coin twice and roll a fair die twice, all independently. Let X be the sum of the number of heads in the two tosses and the number of times the die shows one spot in the two rolls. Even though the random variable X counts "successes" in a fixed number (four) of independent trials, it does not have a binomial probability distribution, because the probability of success is not the same in every trial: It is 1/2 in the trials that involve the coin, and 1/6 in the trials that involve the die. Let us find the probability distribution of X.

The possible values of X are 0, 1, 2, 3, and 4. What is the chance that X = 0? For that to occur, the two die rolls need to show something other than ones, and the coin needs to land tails-up on both tosses. Because all the tosses and rolls are independent, the chance of this is

P(no "ones" and no "heads") = 2C0 × (1/6)0 × (5/6)2 × 2C0 × (1/2)0 × (1/2)2 = 25/144.

What is the chance that X = 1? For that to occur, either exactly one die roll needs to give a one and the coin needs to land tails both times, or exactly one coin toss needs to give a head and the die rolls need to give something other than a one both times. These events are mutually exclusive, so we can find the chance of their union by adding their chances. Each of these events has two binomial components: the number of times a one is rolled in the two die rolls, and the number of times the coin lands "heads" in two tosses. Because the rolls and tosses are independent, we can find the chance that "one" is rolled a specific number of times and the coin lands "heads" a certain number of times by multiplying the chances of the number of ones and the number of heads. We have

P(one "one" and no "heads") = 2C1 × (1/6)1 × (5/6)1 × 2C0 × (1/2)0 × (1/2)2

= 5/72.

P(no "ones" and one "head") = 2C0 × (1/6)0 × (5/6)2 × 2C1 × (1/2)1 × (1/2)1

= 25/72.

The chance that X = 1 is thus 5/72 + 25/72 = 5/12.

 

What is the chance that the X = 2? That can occur three ways: {two "ones" and no "heads"}, {two "heads" and no "ones"}, and {one "one" and one "head"}.

P(two "ones" and no "heads") = 2C2 × (1/6)2 × (5/6)0 × 2C0 × (1/2)0 × (1/2)2

= 1/144.

P(one "one" and one "head") = 2C1 × (1/6)1 × (5/6)1 × 2C1 × (1/2)1 × (1/2)1

= 5/36.

P(no "one" and two "heads") = 2C0 × (1/6)0 × (5/6)2 × 2C2 × (1/2)2 × (1/2)0

= 25/144.

The chance that X = 2 is thus 1/144 + 5/36 + 25/144 = 23/72.

How can X = 3? That can happen two ways: {two "ones" and one "head"} and {one "one" and two "heads"}.

P(two "ones" and one "head") = 2C2 × (1/6)2 × (5/6)0 × 2C1 × (1/2)1 × (1/2)1

= 1/72.

P(one "one" and two "heads") = 2C1 × (1/6)1 × (5/6)1 × 2C2 × (1/2)2 × (1/2)0

= 5/72.

The chance that X = 3 is thus 1/72 + 5/72 = 1/12.

How can X = 4? That can happen only one way: {two "ones" and two "heads"}.

P(two "ones" and two "heads") = 2C2 × (1/6)2 × (5/6)0 × 2C2 × (1/2)2 × (1/2)0

= 1/144.

These are the only possible values of X, so the probability distribution of X is that given in .

Probability distribution of X
x P(X=x)
0 25/144
1 5/12
2 23/72
3 1/12
4 1/144

As a quick check of our arithmetic, we should verify that the probabilities of all the possibilities sum to 100%:

25/144 + 5/12 + 23/72 + 1/12 + 1/144 = (25 + 60 + 46 + 12 + 1)/144

= 144/144

= 100%.

They do!

The following exercises check your ability to compute probabilities involving discrete distributions. Use the probability calculator in if you need it to solve the problems.

Case Study: Trade Secret Litigation

This example is based on a true story. The names of the people and firms have been changed, but otherwise, the facts are stated as I understand them. The data were presented previously in

On 1 May 1995, Sue Mi and Leigh Teagate, two former employees of WeeBee Hardware (WBH), a firm that sells computer components to computer assemblers and retailers, opened the doors of a new company, Weasel Drives (WD). Sue Mi had worked at WBH up to the day before WD opened its doors; Leigh Teagate had stopped working for WBH about 18 months previously. Both firms are in the greater San Francisco Bay Area. From the time WD started business, it sold essentially the same kinds of computer components that WBH did, mostly to former customers of Sue Mi, at essentially the same prices and with essentially the same credit terms. Indeed, in the first two days WD was in business, Sue Mi called the top dozen of her former accounts at WBH. In its first month of business, WD sold about $1 million of equipment to former customers of WBH; that amount increased to about $2 million/month in the course of a few months.

The principals of WBH sought an injunction against WD to prevent it from selling to customers of WBH, alleging that their customer list was a trade secret and had been misappropriated by Sue Mi and Leigh Teagate. (A further complication is that another former employee of WBH and archrival of the principals of WBH apparently set up WD's business, including renting facilities for them, financing their startup, and advancing about $400,000 of inventory to them without collateral.)

It is well established that a customer list can qualify as a trade secret: It has economic value, and derives its value from not being known generally. Customer lists can be the product of years of soliciting new business by cold-calling tens of thousands of potential customers and winnowing that list down to a few hundred or a few thousand who actually do buy the kind of equipment the firm sells, who will buy it from that firm, and who pay promptly (or at least reliably). With knowledge of a firm's list of customers, a competitor would not need to make cold calls to find potential customers, check credit references, or establish a credit history with a particular customer to learn whether it was creditworthy.

WD asserted in response to WBH's request for an injunction that (1) they found the names of the customers in public sources, such as CD-ROMs that contain lists of businesses, and from computer magazines in which those customers advertise, not from their knowledge of WBH's customers, and that (2) such a large overlap with WBH's customer list was inevitable, because WBH had so many customers.

A California Court of Appeals decision (ABBA Rubber Co. v. Seaquist 286 Cal. Rptr. at 528) establishes that a "readily ascertainable by proper means" affirmative defense to a claim of misappropriation is appropriate under certain circumstances:

[I]f the defendants can convince the finder of fact … (1) that "It is a virtual certainty that anyone who manufactures certain types of products uses rubber rollers, (2) that the manufacturers of those products are easily identifiable, and (3) that the defendants' knowledge of the plaintiff's customers resulted from that identification process and not from the plaintiff's records, then the defendants may establish a defense to the misappropriation claim."

ABBA Rubber Co., 286 Cal. Rptr. at 529, ftnt. 9.

WD thus would be in the clear if they could show that they identified the customers they called from the CD-ROMs and/or magazines without using their knowledge of WBHs customer list. I was retained to calculate the probability that various subsets of WD's customer list would overlap with analogous subsets of WBH's customer list to the extent that they do, under various assumptions. The data, which were introduced in are repeated in .

Customer Group At Large WBH WD Overlap
All Customers >90,000 3310 132 93
Customers outside Bay Area >60,000 1769 8 8
CD-ROM >90,000 3310 31 22
Magazine Ads 469 152 27 26
Phone Records >60,000 2,906 68 53

 

to the 53 WBH clients to all 68 clients
1006 1050

The CD-ROMs and magazine advertisements contain information that would allow one to determine whether a firm is a potential purchaser of the kinds of equipment both WBH and WD sell, but no additional information that would identify one of the potential purchasers as a better sales prospect than any other. To test the claim of "inevitability," we might just look at the chance that the overlap would be so large if WD used simple random sampling to select from the CD-ROMS and ads a smaller number of such potential customers to contact. This is not to suggest that WD did use or should have used simple random sampling; at issue is the inevitability of the overlap.

Consider the customers WD claimed to have found in magazines. There were 469 advertisers of computer components in the magazines identified in the discovery process, 27 of which were contacted by WD in the first months of business. Suppose WD had selected 27 firms to contact at random without replacement from the 469. The sample would be equally likely to be any subset of 27 of the 469 firms. (There are 469C27 such subsets. The subsequent analysis assumes that WD contacted 27 customers from among the magazine advertisers—the number 27 is taken to be fixed, not random.)

Of the 27 firms whose names WD claim to have identified from the computer magazines, 26 were customers of WBH. Of the 469 computer advertisers in the magazines, 152 were customers of WBH. What is the chance that a simple random sample of 27 firms from 469, of which 152 are customers of WBH, would contain 26 or more WBH customers?

This is the same as the chance that a random sample of 27 tickets drawn with replacement from a 0-1 box of 469 tickets, of which 152 are labeled "1" and the remaining (469 − 152) = 317 are labeled "0," would contain 26 or 27 tickets labeled "1". The probability is given by the hypergeometric distribution:

P(overlap is 26 or more) = P(overlap is 26 or 27)

= P(overlap is 26) + P(overlap is 27)

= (152C26 × 469−152C1)/469C27 + (152C27 × 469−152C0)/469C27

= (152C26 × 317 + 152C27)/469C27

= 0.0000000000778%.

There are 10 zeros between the decimal point and the first "7." This is comparable to the chance of winning the grand prize in California Lotto two weeks in a row. The hypergeometric distribution also can be used to calculate the probability that if WD had selected 31 firms from the CD-ROM at random, the 31 firms would contain as many of WBH's customers as it did, or even more. The CD-ROMs contain the names, phone numbers, and addresses of at least 90,000 firms that would be potential purchasers of the kind of equipment WD and WBH both sell. We shall assume that all 3310 of WBH's active customers are among the 90,000+ on the CD-ROM. There were 22 WBH customers among the 31 firms whose names WD claimed to have gotten from the CD-ROM. The chance that a randomly selected set of 31 firms from the CD-ROM would contain 22 or more of WBH's 3310 active customers is:

P(overlap is 22 or more) = P(overlap is 22) + P(overlap is 23) + … + P(overlap is 31)

≤ (3310C22 × 86,690C9)/90,000C31 + (3310C23 × 86,690C8)/90,000C31 + … + (3310C31 × 86,690C0)/90,000C31

= 0.0000000000
0000000000
00038%.

(The inequality sign is because there are at least 90,000 computer firms on the CD-ROM.) There are 23 zeros between the decimal point and the "3." This is less than the chance of winning the grand prize in California Lotto 3 weeks in a row. We can also look at the overall overlap of the two firms' customer lists, without regard to the source from which WD claimed to have gotten the names of their customers. In the United States, there are more than 90,000 firms that are potential purchasers of the kind of equipment WBH and WD sell. Of those, 3310 were active customers of WBH, and 132 are active customers of WD; they have 93 customers in common. If WD's 132 customers had been selected randomly from the 90,000+, the chance their list would overlap with WBH's by 93 or more also would be given by the hypergeometric distribution:

P(overlap is 93 or more) = P(overlap is 93) + P(overlap is 94) + … + P(overlap is 132)

≤ ( 3310C93 × 86,690C39)/90,000C132 + (3310C94 × 86,690C38)/90,000C132 + … + ( 3310C132 × 86,690C0)/90,000C132

= 0.0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
00000000013%.

(There are 99 zeros between the decimal point and the "1." The inequality sign is because there are at least 90,000 US computer firms who potentially buy the kind of equipment WD and WBH sell.) This is less than the chance of winning the grand prize in California Lotto 13 weeks in a row.

We can also look at whether WD selectively pursued WBH customers when they sought business outside the San Francisco Bay Area. There are at least 60,000 firms outside the Bay Area that potentially buy equipment of the kind sold by WD and WBH. 1769 of these are customers of WBH; 8 are customers of WD; all 8 of WD's customers are also customers of WBH. If WD had selected 8 customers at random from the 60,000 or more outside the Bay Area, the chance that all 8 would have been customers of WBH as well would be

P(overlap is 8) = (1769C8 × 58,231C0)/60,000C8

= 0.000000000056%.

There are 10 zeros between the decimal point and the "5." WD's telephone billing records were also obtained in discovery. Local calls do not result in specific telephone billing records, so it is not possible to determine the fraction of WD's local calls to customers of WBH. However, long distance and local toll calls—calls beyond a certain radius from WD—do result in records that can be analyzed. We shall denote such a call a "recorded" call. In May, 1995, WD placed 1050 recorded calls to a total of 68 firms, of which 53 were customers of WBH. 1006 of the 1050 recorded calls were to WBH customers. Let us assume, for the moment, that WD derived their list of 132 customers legitimately. If WD were not specifically targeting customers of WBH, one would expect that the fraction of WD's calls to customers of WBH would be approximately in proportion to the fraction of WD's customers who are also WBH's customers. I.e., one would expect about 93/132 of WD's calls to be to WBH customers, and about 39/132 of their calls to be to non-WBH customers. A plausible model for phone calls, if WD were not specifically targeting WBH's customers, would be that each of WD's recorded calls is placed to a WBH customer with probability 93/132, and to a non-WBH customer with probability 39/132, and that whether a call is to a WBH or non-WBH customer is independent from call to call. In that case (and given that WD placed 1050 recorded calls in all), the number of recorded calls by WD to WBH customers would have a binomial distribution with parameters n = 1050 and p = 93/132. Then

P(1006 or more of 1050 recorded calls are to WBH customers | list of 132 customers overlaps by 93) = P(1006 calls to WBH customers | overlap) + P(1007 calls to WBH customers | overlap) +  … + P(1050 calls to WBH customers | overlap)

= 1050C1006 × (93/132)1006 × (39/132)44 + 1050C1007 × (93/132)1007 × (39/132)43 +  … + 1050C1050 × (93/132)1050×(39/132)0

= 0.0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
000000072%.

There are 97 zeros between the decimal point and the "7." This too is less than the chance of winning Lotto 13 weeks in a row. And it assumes that the list of 132 customers was derived legitimately.

A different assumption that also seems reasonable is that the recorded calls to WBH customers would be in proportion to the fraction of WBH customers among those of WD to whom calls would be recorded: 53/68. This would change the answer somewhat: We might then model the number of recorded calls by WD to WBH customers using a binomial distribution with parameters n = 1050 and p = 53/68, which would give

1050C1006 × (53/68)1006 × (15/68)44 + 1050C1007 × (53/68)1007 × (15/68)43 +  … + 1050C1050 × (53/68)1050 × (15/68)0

= 0.0000000000
0000000000
0000000000
0000000000
0000000000
0000000026%,

which is also rather small.

We can use the Multiplication Rule to combine this with the chance that the lists overlap to the extent they do, to find the unconditional chance that WD would make so many of its recorded calls to WBH customers, had it selected its customers at random and placed phone calls independently with equal chance of calling each of its customers. To make this calculation, we assume that there are at least 60,000 potential customers to whom calls by WD would result in phone records, and we use the fact that calls by WD to 2906 of WBH's customers would have resulted in phone records. In that case, under the assumptions already stated, with "customers" taken to mean "customers to whom calls by WD would result in phone records,"

P(1006 or more of 1050 recorded calls to WBH customers) =

P(1006 or more of 1050 recorded calls to WBH customers & 0 of 132 customers shared) +

P(1006 or more of 1050 recorded calls to WBH customers & 1 of 132 customers shared) +

P(1006 or more of 1050 recorded calls to WBH customers & 2 of 132 customers shared) +

+ … +

P(1006 or more of 1050 recorded calls to WBH customers & 132 of 132 customers shared) =

= P(1006 or more of 1050 recorded calls to WBH customers | 0 of 132 customers shared) × P(0 of 132 customers shared) +

P(1006 or more of 1050 recorded calls to WBH customers | 1 of 132 customers shared) × P(1 of 132 customers shared) +

P(1006 or more of 1050 recorded calls to WBH customers | 2 of 132 customers shared) × P(2 of 132 customers shared) +

+ … +

P(1006 or more of 1050 recorded calls to WBH customers | 132 of 132 customers shared) × P(132 of 132 customers shared) =

= 0.0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
0000000000
00000000013%

= 1.261×10−151.

This is less than the chance of winning the California Lotto jackpot 20 weeks in a row.

Although it is doubtful that anyone would choose customers to call randomly, and although the probability model for telephone calls is not compelling, these calculations clearly refute the defendants' claim that the large overlap of their customer list with that of their former employer was inevitable.

Summary

A random variable is an assignment of numbers to outcomes of a random experiment. If a random variable has a countable number of possible values, it is discrete. The probability distribution of a random variable says, for each possible value of the random variable, what the chance is that the random variable will equal that value. Probability distributions can be given by formulae or tables. If a probability distribution assigns probability 100% to the union of a countable set of outcomes, the probability distribution is said to be discrete. Probability distributions of discrete random variables are discrete.

Consider a box of N tickets of which G are labeled "1" and NG are labeled "0." The sample sum of the labels on n tickets drawn at random with replacement from the box has a binomial distribution with parameters n and p=G/N; the probability that the sample sum equals k is

nCk × pk ×(1−p)nk, for k=0, 1,  … , n.

The number of tickets that must be drawn with replacement until the first time a ticket labeled "1" is drawn has a geometric distribution with parameter p=G/N. The probability that the number equals k is

p×(1−p)k−1, for k=1, 2, 3, … .

The number of tickets that must be drawn with replacement until the rth time a ticket labeled "1" is drawn has a negative binomial distribution with parameters r and p=G/N. The chance that the number equals k is

k−1Cr−1 × pr × (1−p)kr, for k=r, r+1, r+2, … .

The sample sum of the labels on n tickets drawn at random without replacement (the sample sum of a simple random sample of size n) has an hypergeometric distribution with parameters N, G, and n. The chance that the sample sum equals k is

GCk × NGCnk/NCn, for max(0, n−(NG)) ≤ k ≤ min(n, G).

These are a few of the discrete distributions that have names; there are many other named distributions, and infinitely many that do not have names.

Key Terms