SticiGui
^{©}
Sample Final 2
Suppose I commute to work by bus or subway, depending on the weather. If it is sunny, I take the subway, but if it is raining, I take the bus. If I take the subway, I will be on time. If I take the bus and there is a traffic jam, I will be late. If there is no traffic jam, I will be on time. The unconditional daily chance of a traffic jam is 20%. Suppose I live in a part of the world that does not have seasons, and that the chance it is raining at the time in question is 10% every day.
The chance that I am on time
A 5 card poker hand is "jack high" if it does
not
contain two or more of the same kind,
and
not all 5 cards are the same suit,
and
not all five cards are consecutive,
and
the highest card in the hand is a jack. That is, all the cards are different kinds, their 5 kinds are not consecutive, not all 5 cards have the same suit, and the highest card is a jack. The "kind" of a card is the number or picture on the card, without regard to the suit of the card. Suppose the order of the value of the kinds of the cards is 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen, king, ace (aces are "high" only). The chance that in a 5-card hand dealt from a well shuffled standard deck of 52 cards is "jack high" is closest to
A researcher is developing genetic screening for a certain type of cancer. She has discovered that a certain genetic marker is associated with the cancer: 0.5% of the general population (including those with and without the genetic marker) are afflicted with this kind of cancer at some point in their lives, 0.1% of the general population have the genetic marker, and 20% of those with the genetic marker will eventually contract the disease. The researcher develops a genetic test that is 95% accurate: the chance that an individual with the marker tests positive for the marker is 95%, and the chance that an individual without the marker tests negative for the marker is 95%.
Among those who will get the cancer, the fraction who also have the genetic marker is closest to
(Refers to the genetic screening question) The fraction of the general population (including those with and without the marker) that would be expected to test positive for the genetic marker is closest to
(Refers to the genetic screening question) Among the general population, the fraction of individuals who both have the genetic marker, and will eventually contract the cancer, is closest to
(Refers to the genetic screening question) The conditional chance that an individual selected at random from the population really has the marker, given that he or she tests positive for the marker, is closest to
(Refers to the genetic screening question.
Hint:
P(A|B) =
(
P(ABC)+P(ABC
^{c}
)
)
/P(
B
)
)
Suppose that whether someone will eventually contract the disease is independent of whether the genetic test gives the correct result (the factors that affect whether the test is accurate for a given individual are different from those that affect whether or not the individual will contract the cancer). The conditional chance that someone will eventually contract the disease, given that he or she tests positive for the genetic marker, is closest to
Suppose that A and B are events. Which of the following is/are always true?
i) the chance A occurs is (100% - chance A does not occur).
ii) the chance A and B both occur is the chance A occurs times the chance B occurs.
iii) the chance either A or B occurs is the chance A occurs plus the chance B occurs.
iv) the chance either A or B occurs is at most the chance A occurs plus the chance B occurs.
v) the chance that A and B both occur is the conditional chance that A occurs given that B occurs, times the chance that B occurs.
A box contains 5 tickets labeled with the numbers {-4, -1, 0, 2, 3}. In 100 random draws with replacement from the box, the expected value of the sum of just the
positive
numbers drawn is closest to
A box contains tickets labeled with the numbers {-4, -1, 0, 2, 3}. In 100 random draws with replacement from the box, the SE of the sum of just the
positive
numbers on the tickets drawn is closest to
Suppose I draw 10 tickets at random with replacement from a box of tickets, each of which is labeled with a number. The average of the numbers on the tickets is 1, and the SD of the numbers on the tickets is 1. Suppose I repeat this over and over, drawing 10 tickets at a time. Each time, I calculate the sum of the numbers on the 10 tickets I draw. Consider the list of values of the sample sum, one for each sample I draw. This list gets an additional entry every time I draw 10 tickets.
i) As the number of repetitions grows, the average of the list of sums is increasingly likely to be between 9.9 and 10.1.
ii) As the number of repetitions grows, the histogram of the list of sums is likely to be approximated better and better by a normal curve (after converting the list to standard units).
iii) As the number of repetitions grows, the SD of the list of sums is increasingly likely to be between 0.9 and 1.1.
A researcher wishes to determine the fraction of individuals with MBA degrees in executive positions in a certain large corporation. She obtains a complete list of the names of the executives; there are 1000. She takes a simple random sample of 100 of the executives to interview; 60 (60%) of them have MBA degrees. She estimates the number of executives with MBA degrees in the firm as a whole to be 60% of 1000 (namely, 600), with a standard error of
1000×SE(sample percentage) = 1000×(60%×40%)
^{½}
/100
^{½}
= 1000×4.9% = 49.
On the average, for estimates obtained this way, you would expect
Preparation H is a treatment for hemorrhoids. A TV advertisement for Preparation H reports that more people use Preparation H than use all the other leading brands combined, and that more people trust Preparation H. The viewer is "invited" to conclude that Preparation H is therefore the best hemorrhoid treatment on the market. This inference is dubious, for a variety of reasons. The most important is that
Someone I know claims to be able to spin a coin in such a way that he can make it land "heads" 90% of the time, on the average. I want to test the hypothesis that he's bluffing (that it's equally likely that the coin will land "heads" or "tails") against the alternative that he is right. I propose to test this hypothesis by having him spin the coin again and again until it first lands "tails." If it takes more than 4 tries, I'll conclude that he's right. Assume that the spins are independent.
Under the null hypothesis that he cannot influence the outcome, the number of spins until the coin lands "tails" has a
(Refers to coin spinning problem) Under the alternative hypothesis, the expected number of spins to the first "tail" is
(Refers to coin spinning problem) The significance level of this test is closest to
(Refers to coin spinning problem) The power of this test against the alternative hypothesis that my friend is right is closest to
I desire to estimate the percentage of homes in Berkeley, CA, that are assessed more than $3000 in annual property taxes. I obtain a list of all home addresses in Berkeley, and take a simple random sample of size 100 from the list, which contains several thousand addresses. I then contact the County Tax Assessor's office and obtain copies of the tax bills for those 100 properties. The sample percentage with tax bills over $3000 is 30%. An approximate 95% confidence interval for the percentage of Berkeley homes with tax bills over $3000 is
A certain project will be undertaken in 6 stages. There is a 95% chance that each stage will be completed on time. The chance that all 6 stages are completed on time is (choose the most precise answer that is true)
(Refers to project question) The expected number of stages that will be completed on time is
I want to test whether a particular headache remedy is effective. The null hypothesis is that it is not effective; the alternative hypothesis is that it is effective.
(Refers to headache problem.) To test whether the headache remedy works, I propose a randomized, controlled, double-blind experiment: I will find 10 volunteers who suffer regularly from headaches. Each will be given two pills, one of which is a placebo, and one of which contains the remedy I want to test. The pills are labeled "A" and "B." The subjects are instructed to take pill "A" the first time they get a headache, and pill "B" the second time. They are to write down how long it took for their headache to abate each time, and report the result to me (assume that they do this perfectly accurately). For half the subjects, pill "A" is placebo and pill "B" is remedy; for the other half, it's the other way around. Which subjects get remedy first is decided randomly by assigning each subject a random number between zero and one. The 5 subjects who were assigned the largest random numbers get treatment first; the other 5 get placebo first.
Of the two headaches each subject has during the study, one will last longer. It is reasonable to assume that if the remedy does not work, the longer headache would be equally likely to be either the remedy-treated headache or the placebo-treated headache. Assume also that if the remedy does not work, whether the remedy-treated or placebo-treated headache lasts longer is independent for different individuals.
Under these assumptions, if the remedy does not work, the number of subjects whose remedy-treated headaches are shorter than their placebo-treated headaches has
(Refers to headache problem) Suppose I reject the null hypothesis if for 9 or more of the subjects, the remedy-treated headache was shorter than the placebo-treated headache. The significance level of this test is closest to
(Refers to headache problem) Suppose that in reality, the remedy shortens about 50% of all headaches, so that for a given subject, the chance that the remedy-treated headache is shorter than the placebo-treated headache is 75%. For the test just described, the chance of detecting this difference is closest to
(Refers to headache problem) Suppose I perform the experiment, and for all 10 subjects, the remedy-treated headache is shorter than the placebo-treated headache. The
P
-value of the null hypothesis would be closest to
A researcher wishes to estimate some parameter from 5 independent observations. Each observation is equal to the parameter the researcher wishes to estimate, plus chance error. The expected value of the chance error is zero (the measurements are unbiased), and the SE of the chance error (the variability in the measurement process) is known quite accurately from other experiments to be 1. To form a confidence interval for the parameter, with confidence level at least 95%, the researcher should use
A researcher wishes to determine the effect of dietary fat intake on serum (blood) cholesterol level. He takes random samples of 1,000 adults each from the populations of 80 countries, and makes a scatterplot of average serum cholesterol (milligrams per deciliter; mg/dl) versus average daily fat intake (in grams) for the countries (the scatterplot has 80 points, one for each country). The scatterplot is roughly football-shaped. The correlation coefficient between fat intake and serum cholesterol is 0.8. The regression line has slope 2 (2mg/dl of serum cholesterol per gram of fat per day).
i) For the sample, people who consume one less gram of fat per day than average have a serum cholesterol level about 2mg/dl less than average.
ii) An adult should expect that for each gram of fat he eliminates from his daily diet, his serum cholesterol level will fall by about 2 mg/dl.
iii) Because of ecological correlation, this study has little bearing on the effect of dietary fat on serum cholesterol for individuals.
iv) Because this is an observational study, one should not infer a causal relationship between dietary fat intake and serum cholesterol level from the observed association.
Box A contains 4 tickets labeled with the numbers {0, 0, 0, 1}. Box B contains 4 tickets labeled with the numbers {3, 4, 5, 6}. Consider the normal approximation to the probability distribution of the sum of 10 draws with replacement from each of the boxes.