Real card shuffles. There's some elegant math theory about "random shuffles", e.g. that it takes about 7 shuffles to make a card deck random (see Bayer-Diaconis or the more elementary Aldous-Diaconis paper. The theory assumes a kind of idealized shuffle which isn't quite what the average person really does. So there are interesting possible experiments where you compare shuffling 2 times, or 3 times, or 4 times, and then gather statistics (e.g. shape of bridge hands) from the resulting deals.
Is coin tossing really biased? Persi Diaconis suggests that in coin-tossing there is a small bias -- maybe 1/100 - towards the coin landing the same way as it started. See this news item or the journal article. To test this experimentally one would need say 40,000 tosses. More importantly, one needs some very strict protocol to ensure scientific respectability. Volunteers? Your chance to get invited onto the Late Late Show!
Luck-Skill spectrum. The "state fair" example in handout gives a setting with an adjustable parameter which interpolates between skill and luck. Can you invent another experiment which demonstrates the same point, and get data? The "wine cork" example deals with both "luck and skill" and with the idea of a "learning curve".
Mixing paper tickets. The 1970 draft lottery, intended to pick birthdays in random order, didn't do a very good job of randomization. This would be a nice reading project: see this site or do a Google Scholar search. The conclusion is that to physically mix a large number of objects is much harder than you think. Here's a course project. Imagine you want to run a lottery with say 100 names, and you do this by writing names on pieces of paper (e.g. stiff paper like business cards), putting them in some container (e.g. a cardboard box) and then just shaking, turning over, etc the box for 5 minutes. Then reach in and draw out tickets. My prediction is that this does a bad job of mixing -- one can do statistical tests on the results that show the order of the 500 draws are non random. It would be very interesting to do enough experiments to estimate how the number of shakes needed to mix grows with the number of tickets. As with the "biased coin tossing" project, it is important to have a strict protocol describing exactly what you're doing.
Psychology of probability. The book Cognition and Chance by Nickerson has many references to the original research experiments. It's a good source for reading projects and also for possible course projects repeating an experiment.
Studying real queues. There's an elegant and well-developed math theory of queues, but I suspect it doesn't really apply to most waiting lines we encounter in everyday life.
1. Coffee shop. A (several person) project is to sit in a coffee shop for several periods of time and record (in as much detail as practical) times of arrival to waiting lines and times of service completion. Then compare to theory models.
2. Online game rooms. Go to (say) www.pogo.com, go into (say) Spades and go into (say) Intermediate. You'll see a list of about 20 game rooms and how many people are in each; usually some are at or near the maximum allowed (125) and most others are nearly empty. Note this is the opposite of supermarket lines, whch stay roughly balanced because customers tend to choose short lines. What's going on with the game rooms? Well, if you want to find a game to join it's more sensible to choose an almost-full room. A project is to first gather some data on room occupancies over (say) a 3-hour period, and then formulate and test some conjectures.
Accuracy of Benford's law. The project is to get a number of different data sets (say 30) with the "highly variable" property. For each data set calculate
y = an estimate of the difference between
the true first-digit distribution and the Benford distribution
(after subtracting sampling variation);
x = log(interquartile range) for original data.
My prediction is that, plotting (x,y) for the different data sets, you will see y decreases as x increases. Maybe the curve resembles what you would get from logNormal data?
Route-lengths in transportation networks. I need the following type of data. Take the 10 [or 20 or 40] largest cities in some State or Country, and find the distances between each pair by road [or rail] and in straight line. See Figure 1 of this paper for an example; but I would like to expand the 2 data-sets there to 10 data-sets. Do this well and get your name on a scientific paper!
Study some new type of social network
A social network consists of
(i) a specified set of individual people
(ii) and a specified relationship which two people may have.
Mathematically, this gives you a graph where vertices are people and edges indicate where the relationship holds. Many notions of "relationship" have been studied, but I'm sure there are some that no-one has yet thought about.
Amazon.com customer book reviews. Sample books with (say) 10-40 reviews. For each review, note date posted and number of favorable votes. There is a strong association between these variables, described in Exploratory data analysis by Robert Huang. But there's scope for much further analysis.
A Wikipedia entry. Write a Wikipedia entry (or entries) for a topic in this course that doesn't have a satisfactory entry.
More risks in everyday life. Look at the Ropeik-Gray book Risk. Choose one kind of risk they don't do (e.g. child abduction by stranger; flu pandemic) and write a section in the same style as the book.
Project Carry out some such prediction scheme. This will require some work in data collection, plus programming skills. Definitely a team project.
1. Consider soccer matches where the score is 1-1 at end of regulation time. Look at the times the two goals were scored. Are these uniform random? Or is there a tendency for a "quick equalizer"?
2. Common sense and theory both say that you should take more risks when losing and near the end of the match. For instance in football, classify interceptions by (which quarter? thrown by currently losing/winning team?). One expects the proportion of interceptions which are thrown by the losing team in the fourth quarter to be considerably larger than 1/8 (the proportion if uniform). Does data confirm this prediction? Can you see this effect in other sports?
3. Hot hands. For one player in a basketball game, record the sequence of successes/failures in their shots. Given the total number of successes (say 18 out of 29) do they occur in random order? Almost all sports players believe in some notion ("hot hands") that sometimes they are "on top of their form" some of the time but not other times, so that the pattern of successes is more clumpy than it would be if truly random. But statisticians who have studied this are dubious -- data looks pretty random to them. Project: gather some data, perhaps from another sport (e.g. volleyball: kills by spikers). Then there are standard ways to analyze such data.
given a sample T from Uniform[0,t_0] with unknown t_0, find a frequentist C.I. or Bayesian posterior for t_0.
1. Find coherent data on s.d. of stock index changes over (1 day; 1 week; 1 month; 1 year) to see how well the square root law works. Then test more subtle predictions of random walk theory, e.g. the arc sine law.
2. In the context of the Kelley criterion for apportioning between stocks, suppose the annual stock gain X can be decomposed as an independent sum X_1 + X_2 where X_1 could be known at some cost (imagine var(X_1) = 0.1 var(X), say). What is the long-term advantage of knowing X_1? Do a simulation study with various distributions. (Conceptual point: this is the simplest model for studying the value of "fundamental analysis").
3. Look at historical data on annual stock returns and short term interest rates. See how well the Kelley strategy would have worked, based on modeling the next year's return as a random pick from the previous 20 years returns.