40,000 coin tosses yield ambiguous evidence for dynamical bias

Background

The 2007 Diaconis - Holmes - Montgomery paper Dynamical bias in the coin toss suggests that in coin-tossing there is a particular ``dynamical bias" that causes a coin to be slightly more likely to land the same way up as it started. In brief, whether the coin lands the same way up as it started depends deterministically on the initial parameters of motion imparted at the instant of tossing. Each person's individual "tossing style" gives some probability distribution on the initial parameters, but (unless the spread is unrealistically small) it turns out from a careful analysis of the physics that the resulting overall probability always works out to be 1/2 or greater, though it would presumably vary from person to person. The basic reason is that, instead of rotating around a horizontal axis as one might imagine, a typical tossed coin is rotating around a tilted axis which is precessing in 3-space, and this entails a certain degree of ``memory" of the initial parameters. Combining theory with data on initial parameters from a small number of tosses obtained via high speed photography, Diaconis et al gave a rough estimate of a 0.8% bias (i.e. a 50.8% chance of landing same way up as started) for a typical tosser, and discuss a number of possible caveats to the theory. It is important to distinguish this subtle 3-dimensional effect ("precession bias"), which persists when the number of rotations is large, from a more obvious 2-dimensional bias when the number of rotations is small ("few rotations bias" - see below).

However, no experiment with actual coin-tosses has been done to investigate whether the predicted effect is empirically observed. Diaconis et al noted, correctly, that to estimate the probability with a S.E. of 0.1% would require 250,000 tosses, but this seems unnecessarily precise. Let's work with numbers of tosses rather than percents. With 40,000 tosses the S.E. for ``number landing same way" equals 100, and the means are 20,000 under the unbiased null and 20,320 under the "0.8% bias" alternative. So, if the alternative were true, it's quite likely one would see a highly statistically significant difference between the observed number and the 20,000 predicted by the null.

And 40,000 tosses works out to take about 1 hour per day for a semester .........

The experiment

Over the Spring 2009 semester two Berkeley undergraduates, Priscilla Ku and Janet Larwood, undertook to do the required 40,000 tosses. After preliminary experimentation with practical issues, there was formulated a specific protocol, described in detail below. Cutting to the chase, here is the complete data-set as a .xlsx spreadsheet (see sheet 2). This constitutes a potentially interesting data-set in many ways -- one could compare numerous theoretical predictions about pure randomness (lengths of runs, for instance) with this empirical data. For the specific question of dynamical bias, the relevant data can be stated very concisely:

of 20,000 Heads-up tosses (tossed by Janet) 10231 landed Heads
of 20,000 Tails-up tosses (tossed by Priscilla) 10014 landed Tails

Analysis

A first comment is that it would have been better for each individual to have done both "Heads up"and "Tails up" tosses (which was part of the intended protocol, but on this aspect of the protocol there was a miscommunication); this would separate the effect of individual tossing style from any possible effect arising from the physical difference between Heads and Tails. But it is very hard to imagine any such physical effect, so we presume the observed difference (if real rather than just chance variation) is due to some aspect of different individual tossing style.

Applying textbook statistics:

testing the "unbiased" null hypothesis with the combined data, we get z = 2.45 and a (1-sided) p-value < 1%
assuming dynamical bias with possibly different individual biases, and testing the null hypothesis that these two individuals have the same bias, we get z = 2.17 and a (2-sided) p-value = 3 %

We leave the statistically literate reader to draw their own conclusions. A caveat is that the experiment did not use "iconic tosses" (see below), and we can't really distinguish the possible precession bias from the possible "few rotations" bias, even though there was no visual indication of systematic difference between the two tossing styles.

Finally, for anyone contemplating repeating the experiment, we suggest getting a larger group of people to each make 20,000 iconic tosses, for two reasons. Studying to what extent different people might have different biases is arguably a richer question that asking about overall existence of dynamical bias. And if the "few rotations bias" exists then we would see it operating in both directions for different people, whereas the predicted "precession bias' is always positive.

Iconic tosses and the few rotations bias

We visualize an "iconic toss" done standing; the coin moves roughly vertically up, rising a height of 2 or 3 feet, spinning rapidly, and is caught in the open hand at around the level it was tossed.

The obvious elementary analysis of coin tossing is that a coin lands "same way up" or "opposite way up" according to whether the number r of full rotations (r real, because a rotation may be incomplete) is in [n - 1/4, n+1/4] or in [n + 1/4, n+3/4] for some integer n. When the random r for a particular individual has large spread we expect these chances to average out to be very close to 1/2; but when r has small spread, in particular when its mean \mu is not large, one expects a "few rotations bias" toward "same way up" if \mu is close to an integer, or toward "opposite way up" if \mu is close to a half integer.

Detailed protocol

To avoid tiredness when tossing standing up, the participants sat on the floor. One person did a long sequence of tosses (all starting the same way up) while the other recorded the result directly onto the spreadsheet. Tosses where the coin was dropped were disregarded. Dates, times and person tossing were also recorded on the spreadsheet. The coin used was an ordinary dime. Visually, the tosses were typically rather low (maybe 18 inches high), rotating moderately fast, and angled rather than purely vertical.