Which
math probability predictions are actually verifiable?
Action speaks louder than words -- but not nearly as often. Mark Twain
This whole "real world" project started around 2000 when I asked myself
After 350 years study of mathematical probability, what
theoretical predictions involving randomness in the real world
are actually verifiable by finding interesting recent data?
In the context of teaching a course, I meant
verifiable by a junior or senior undergraduate majoring in
Statistics, as part of a course project.
As a non-example, statistical physics predicts that the velocities of air
molecules are multivariate Normal, but I don't expect my students to be able to
verify this experimentally.
Similarly, statistical theory says that the effectiveness of
medical treatments is better assessed via randomized controlled experiments
than via anecdotal information found on the Internet; but again I don't
know how students could verify this in a course project.
Ideally
- The data should be new in the sense of not used for this purpose before.
- Simulations of a model are not data, but can be used to see what a model predicts, to
compare with real-world data.
- The topic should be (at least somewhat) interesting "outside the classroom".
The sum of 100 rolls of a die is not interesting in that way.
Some examples are readily found in introductory textbooks.
I had rather naively assumed that there were many more, and sought to teach a course
by starting each lecture with some interesting data which motivates some less elementary theory.
18 years later I realize this was unduly optimistic.
Freshman math textbook examples
I will not discuss these except to link to some actual data examples.
- The birthday problem
prediction, which can be nicely illustrated by
baseball teams' active rosters.
- The
Poisson distribution approximation for counts of rare occurrences,
illustrated by
daily murders in London or
World Cup goals by minute.
- Accuracy of opinion polls, illustrated by
Field California Poll records.
- The
regression effect illustrated by
next year's performance by this year's best and worst teams.
- Benford's law.
See this collection of data sets.
- Textbooks are full of "games of chance" calculations for poker, blackjack, craps, roulette, lotteries
etc. Because the accuracy of the models here is uncontroversial, I have never been motivated
to compare theory with data, but you could try with
Powerball statistics, for instance.
- The Bayes calculation of probability of a false positive is usually given with
hypothetical figures, but it is
not hard to find real data.
The kind of examples that should be in Senior math textbooks (but rarely are)
These are the kind of topics I treat in my course, and on this site.