Philip B. Stark, 1 March 2010 (last updated 27 March 2010)

Post-election audits count the votes in randomly selected batches of ballots by hand.
They compare the result of the hand counts to the final-but-for-the-audit results
(the *apparent* results) for those same batches.
The information an audit provides about the rate of vote-tabulation errors—and hence
about the accuracy of the vote-tabulation system and the electoral outcome—depends
on the sizes of the batches from which the audit sample is drawn.
Smaller batches give more information for the same amount of hand counting.

When the apparent outcome of an election is indeed correct, the amount of hand counting required to confirm so statistically is roughly proportional to the size of the auditable batches of ballots. For instance, if batches all consist of 1,000 ballots, the number of ballots that need to be counted by hand to confirm the electoral outcome is about 1,000 times larger than if each of those batches were subdivided into 1,000 batches consisting of single ballots. This note gives two heuristic explanations for the relative efficiency of smaller batches: estimating the number of of coconut jelly beans in 25lbs of assorted jelly beans and estimating the amount of salt in 1,200 ounces of soup stock.

Post-election audits serve many purposes, including quality control and improvement, fraud deterrence and detection, and verifying electoral outcomes. The goal of a risk-limiting post-election audit is to ensure statistically that the final, official electoral outcome of a contest is correct; that is, a probabilistic assurance that the official outcome is the same outcome that a full hand count of the audit trail would show. (The electoral outcome that a full hand count of the audit trail would show is, by definition, the correct outcome.)

I use the term *apparent outcome* to mean the electoral outcome
that will become official unless the audit intervenes.
It is the electoral outcome—the set of winners—after all the votes have been
tabulated, including votes cast in person,
votes cast by mail, and votes cast provisionally.
If the audit gives strong statistical evidence that the apparent
outcome is right, the hand counting can stop and the apparent outcome is reported as the
final official outcome.
If not, the counting continues until either there is strong statistical
evidence that the apparent outcome is right,
or until the entire audit trail has been counted by hand.
If the entire trail is counted by hand, the correct electoral outcome is then known and can be
reported as the final official outcome.

An auditing procedure is *risk-limiting* if it has a known minimum chance of progressing
to a full hand count whenever the outcome is wrong, no matter what caused the outcome to be wrong.
The *risk* is the maximum chance that the audit does not progress to a full hand count when
the apparent outcome is in fact wrong.
(The maximum is over all ways in which the outcome might be wrong, including the possibility
of programming errors, voter errors, and deliberate fraud.)
If a risk-limiting audit stops without a full hand count, then either the electoral
outcome is in fact correct, or something very unlikely occurred—something that has
a chance no larger than the risk limit.

If the apparent outcome is wrong, the true margin of some apparent winner over some apparent loser is actually negative, even though the apparent margin is positive. That could occur, for example, if enough ballots were interpreted by the vote tabulation system as overvotes or votes for an apparent winner when a hand inspection would show that they were cast for an apparent loser.

Of course, there might be offsetting differences that *deflated* the apparent
margin compared to the margin that a full hand count would show.
For instance, the original tabulation might find an undervote when a hand inspection of the
same ballot would show a vote for an apparent winner.
For the apparent outcome to be wrong, the differences that inflated the apparent margin
net of the differences that deflated the apparent margin must be bigger than the
apparent margin.
(In contests with more than two candidates, there's some additional bookkeeping, but the
idea is the same.)

A risk-limiting audit assesses whether the true margin is positive; that is, whether the apparent winners are the real winners. If the hand count of the audit trail in the batches selected at random for audit gives strong statistical evidence that the true margin is positive, that constitutes evidence that the apparentwinner really won. Then the audit can stop. If not, hand counting continues. Eventually, either there is strong statistical evidence that the true margin is positive, or there has been a full hand count, which reveals the true electoral outcome.

The strength of the evidence that hand counting a random sample of ballots gives about the true margin depends on many things, including the apparent margin, the discrepancies between the hand count and the apparent subtotals for the batches in the sample, the number of ballots in the sample, the sizes of the batches from which the sample is drawn, the way the random sample is drawn (the sampling deisgn), and other variables. This note concentrates on the effect of the sizes of the batches.

Post-election audits are generally based on drawing random samples of batches of ballots (or auditable records, such as voter-verifiable paper audit trails, VVPATs). Before drawing the audit sample, all the ballots in the contest are divided into batches, and vote subtotals for each batch are determined and reported (or, if not reported, committed to irrevocably; see below). The way batches are defined matters: The amount of evidence the hand count gives depends not only on how many ballots are counted but also on how those ballots were selected.

For instance, suppose that 50,000 ballots were cast in a contest, 500 ballots in each of 100 precincts. Consider three ways of drawing 500 ballots to tally by hand:

- Draw one of the 100 precincts at random. Tally the votes on the 500 ballots cast in that precinct by hand.
- Divide each precinct into 10 batches of 50 ballots. Select 10 of the resulting 1,000 batches at random, without replacement. Tally the votes on the 500 ballots in those 10 batches by hand.
- Draw 500 ballots at random without replacement from the 50,000 as a whole. (Conceptually, put all 50,000 ballots in a big basket, stir them thoroughly, and draw 500 without replacement.) Tally the votes on those 500 ballots by hand.

The first approach gives much less reliable information about the vote tabulation errors in the contest as a whole than the last approach does. The second is intermediate. I hope that the two food examples below help explain why. For more information about sampling, see the relevant chapters of SticiGui.

In current election audits, a batch typically consists of all the ballots cast
in a given precinct.
That is convenient because vote tabulation systems are designed to report subtotals for
precincts.
However, that means batches can contain 1,000 ballots or more.
Audits could be more effective with far less hand counting if the ballots were divided
into smaller batches.
Ideally, each ballot is a batch of its own, as in the third approach above;
this is called *single-ballot auditing*.
Current vote tabulation systems cannot report their interpretation of individual ballots,
which is prerequisite for single-ballot auditing.
(However, see
Stark, P.B., 2009.
Efficient post-election audits of
multiple contests: 2009 California tests. Refereed paper presented at
the 2009 Conference on Empirical Legal Studies.
http://ssrn.com/abstract=1443314.)

There are 100 4oz bags of various flavors of jelly beans—25lbs in all. Some bags have assorted flavors, some only a single flavor. All jelly beans weigh essentially the same. I love coconut jelly beans, and want to estimate how many there are among the 25lbs. Consider two approaches.

- The 100 bags are poured into a large pot and stirred well. Then 4oz of jelly beans are drawn without looking. Estimate the total number of coconut jelly beans to be the number in the sample, times 100.
- One of the 4oz bags is selected at random. Estimate the total number of coconut jelly beans to be the number in that bag, times 100.

Both estimates are *statistically unbiased*, but the first has much lower *variability*.
(Statistical bias is explained below; also see
SticiGui.)
Mixing disperses the coconut jelly beans pretty evenly throughout the pot.
The sample is likely to contain coconut jelly beans in roughly the same proportion as the
100 bags do overall, so multiplying the number in the sample by 100 gives a reasonably
reliable estimate of the total.

In contrast, a bag selected at random could easily contain only coconut jelly beans (if any of the bags has only coconut) or no coconut jelly beans (if any of the bags has none). Since the bags can have quite different fractions of coconut jelly beans, a 4oz bag selected the second way is quite likely to contain coconut jelly beans in a proportion quite different from the overall proportion of coconut jelly beans, so multiplying that number by 100 could easily be far from the total number of coconut jelly beans among the 100 bags.

Even though both procedures sample 4oz of jelly beans "at random," they do not give equally
reliable estimates of the total number of coconut jelly beans among the 25lbs.
For the first approach, we can get a reliable estimate of the number.
But the second method is unreliable.
To get a reliable estimate using the second
approach—that is, counting the coconut jelly beans in randomly selected bags—we would
need to look at quite a few bags (quite a few *clusters*, see below), not
just one.
It's more efficient to mix the beans before selecting 4oz.
Then 4oz suffices to get a reasonably reliable estimate.

We have 100 12-ounce cans of stock, of a variety of brands, styles, and types: chicken, beef, vegetable, low salt, regular, etc. We want to know how much salt there is in all 1,200 ounces of stock as a whole. The assay to measure salt ruins the portion of the stock that is tested: The more you test, the less there is to eat. Consider two approaches:

- Open all the cans, pour the contents into a cauldron, stir well, and remove a tablespoon of the mix. Determine the amount of salt in that tablespoon, multiply by the total number of tablespoons in the 100 cans (1T = 0.5oz, so the total number of tablespoons in the 100 cans is 12×100×2 = 2,400T).
- Select a can at random, determine the amount of salt in that can, and multiply by 100.

Both estimates are *statistically unbiased*.
However, the first estimate has much lower *variability*:
That single tablespoon is extremely likely to have some stock from all 100 cans.
The salt is likely to be spread out quite evenly through all the stock in the cauldron;
it is very unlikely that the tablespoon will consist almost entirely of salt or almost
entirely of water.
Rather, the tablespoon is likely to contain salt in roughly the same concentration as the
100 cans do on the whole.

In contrast, a can selected the second way can be quite likely to contain salt in a concentration quite different from the 1,200 ounces of stock as a whole, unless all the cans have nearly identical concentrations of salt.

For the first approach, we can get a reliable estimate of the total salt
from a single tablespoon (1/2 ounce) of stock.
But for the second approach, even 12 ounces of stock is not enough to get a reliable
estimate.
(A tablespoon from the selected can would suffice to determine the
salt *in that can* accurately, if the contents of the can have been stirred well.
But even the *exact* amount of salt in the can does not give a reliable estimate of the
salt in the 100 cans as a whole, because there is no mixing across cans.)
The first approach gives a more reliable result at lower "cost": It spoils less
stock.

To get a reliable estimate using the second approach—that is, without mixing
the stock from different cans together—we would need to assay quite a few
cans selected at random (quite a few *clusters*, see below).
A single can is not enough, even though it contains 24 tablespoons of stock—far more than
we need in the first approach.
Sampling many randomly selected cans would amount to mixing the stock after the fact,
but even then, it isn't mixed as well as it is in the first approach.
It's more efficient and cheaper to mix the stock before selecting the sample.
Then a single tablespoon suffices to get an accurate estimate.

A vote-tabulation error that overstated the apparent margin is like
a coconut jelly bean or a fixed quantity of salt.
A precinct or other audit batch is like a bag of jelly beans or a can of stock.
Drawing the audit sample is like selecting a bag or a 4oz scoop of jelly beans
or a tablespoon or can of stock.
(An important difference between auditing and these food examples
is that in elections there can be differences that
*decrease* the apparent margin:
What matters for verifying election outcomes is the
net increase of the apparent margin over the margin that a full hand count would show,
after differences that decreased the apparent margin are subtracted.
Both differences that increase the apparent margin and differences that decrease the
apparent margin matter for quality control and fraud detection.)
Counting ballots by hand has a cost: The more you have to count, the greater the cost.
Hence, you want to count as few ballots as possible as long as you can still determine whether
the electoral outcome is correct.
Similarly, counting jelly beans or assaying the salt in the soup has a cost.

In these two food examples, the first approach is like single-ballot or small-batch auditing. All the ballots are mixed together well, and we draw enough ballots to get a good idea of the net inflation of the margin on average across geography, voting methods, etc. Mixing the stock or the batter is like mixing the ballots in a huge vat, then reaching in and selecting some at random. The resulting sample of ballots is more likely to show about the same net rate of differences that increased the apparent margin as there are in the contest as a whole, compared to the net rate of differences in a sample that consists of the same number of ballots drawn as whole precincts.

In both examples, the second approach is like auditing using precincts or other
large batches of ballots.
There could be large concentrations of differences that increased the apparent margin
in a small number of batches, because there is no
"mixing" across batches.
A single batch drawn using the second approach doesn't tell us much about the overall
rate of margin-inflated differences in the contest, no matter how large the batch is (within reason).
This approach is like rubber-banding big bundles of ballots together, mixing the collection of bundles,
and selecting bundles at random, with no movement of ballots across bundles.
To compensate for the lack of mixing across batches of ballots, we need to
look at a lot of batches, just like we need to count the coconut jelly beans in many bags
or assay many cans of soup if we don't
mix their contents together *across clusters* before drawing the sample.
The first method is much more efficient.

Suppose we have 50,000 ballots in all, 500 ballots cast in each of 100 precincts. Among them, 1,000 (i.e., 2%) have been interpreted incorrectly by the tabulator; the tabulator counted them for the apparent, unconfirmed winner but a manual count would show them to be for the apparent loser. The other 49,000 ballots were tallied correctly.

Suppose we will draw a random sample of 500 ballots to count by hand. We have the ability to check the machine subtotal for those 500 ballots against manual subtotal. If any of the misinterpreted ballots is among the 500, the machine and hand counts will not match, so we will know that at least one ballot has been misinterpreted. We will consider the three sampling schemes mentioned above, all of which select 500 ballots: drawing a precinct at random, drawing 10 batches of 50 ballots at random without replacement, and drawing 500 individual ballots at random without replacement (a simple random sample).

The next two subsections address two questions:

- What is the chance that the sample contains any misinterpreted ballots at all?
- What is the chance that the percentage of misinterpreted ballots in the sample is at least half the percentage of misinterpreted ballots in the contest as a whole?

We will calculate the chance that the machine and hand counts do not match for three ways of drawing the sample: (i) drawing a single batch of 500 at random, that is, drawing a precinct's worth of ballots, (ii) dividing the precincts into 10 batches of 50 ballots each and drawing 10 batches at random without replacement from the resulting 1,000 batches, and (iii) drawing a simple random sample of 500 ballots at random.

For method (iii), the chance does not depend on how the misinterpreted ballots are spread across precincts: It is about 99.996%, no matter what. But for methods (i) and (ii), the chance does depend on how many incorrectly interpreted ballots there are in each batch. To illustrate that dependence, we will calculate the chance for several ways of spreading the misinterpreted ballots across batches. For simplicity, assume that when a precinct is divided into 10 batches, the number of misinterpreted ballots in each of those 10 batches is the same. For instance, if the precinct has 20 misinterpreted ballots, each of the 10 batches has 2 misinterpreted ballots.

misinterpreted ballots by precinct |
randomly selected precinct of 500 |
10 randomly selected batches of 50 |
simple random sample of 500 |
---|---|---|---|

10 in every precinct | 100% | 100% | 99.996% |

10 in 98 precincts, 20 in 1 precinct | 99% | ≈100% | 99.996% |

20 in 50 precincts | 50% | 99.9% | 99.996% |

250 in 4 precincts | 4% | 33.6% | 99.996% |

500 in 2 precincts | 2% | 18.4% | 99.996% |

Phrased differently, the "confidence" (a slight abuse of the statistical term) that no more than 2% of the 50,000 ballots were misinterpreted if none of the ballots in the the sample were misinterpreted is as follows:

sampling method | randomly selected precinct of 500 |
10 randomly selected batches of 50 |
simple random sample of 500 |
---|---|---|---|

"confidence" | 2% | 18.4% | 99.996% |

Even though 500 randomly selected ballots are counted by hand in every case, the probability of finding a misinterpreted ballot varies enormously. In the case most favorable to precinct-based sampling, hand counting a single randomly selected precinct is guaranteed to find a misinterpreted ballot (10, in fact). But the chance falls quickly as the misinterpreted ballots are concentrated into fewer precincts. In the case least favorable to precinct-based sampling, the chance is only 2% for a randomly selected precinct and 18.4% for 10 randomly selected batches of 50—but remains 99.996% for simple random sampling. The smaller the batches, the greater the minimum chance the sample will show that at least one ballot was misinterpreted.

If the misinterpretations were caused by equipment failure in the precinct, that might be expected to concentrate errors in only a few precincts. If the misinterpretations occurred because pollworkers accidentally provided voters pens with the wrong color ink to mark the ballots, that might be expected to concentrate errors in only a few precincts. If a fraudster were trying to manipulate the outcome, he or she might target the ballots in only a few precincts, either to avoid detection or for logistical simplicity. In these three hypotheticals, if the sample is drawn by selecting an entire precinct it could easily be squeaky clean. But with the same counting effort, the chance of finding at least one error if the 500 ballots are drawn as a simple random sample remains extremely high, 99.996%, whether the misinterpreted ballots are concentrated in only a few precincts or spread throughout all 100.

While efficient risk-limiting auditing methods use the data in more complicated ways than simply asking "is there any error at all in the sample?," the amount of information the sample carries about the total number of errors depends strongly on how the sample is drawn. The smaller the clusters are, the harder it is to hide error—even though there are ways of scattering errors that make it easy for all three sampling methods to find at least one error.

The percentage of misinterpreted ballots in the sample percentage is not necessarily a reliable or accurate estimate of the percentage of misinterpreted ballots in the contest. The previous subsection shows that the chance of finding even a single misinterpreted ballot can be quite low when the percentage of misinterpreted ballots in the contest is 2%. When the sample doesn't find any error, the percentage of misinterpreted ballots in the sample is zero: The percentage in the sample underestimates the percentage in the contest by 100%.

But even when the sample does find some misinterpreted ballots, the percentage of such ballots in the sample can be much lower than the percentage in the contest as a whole. When that happens, we might conclude erroneously that the outcome of the contest is right when in fact it is wrong.

How likely is the percentage of misinterpreted ballots in the sample to be at least half the percentage of misinterpreted ballots in the contest as a whole? That is, what is the chance that the percentage of misinterpreted ballots in the sample is at least 1% when the percentage in the contest as a whole is 2%? The following table gives the answer for the same set of scenarios.

misinterpreted ballots by precinct |
randomly selected precinct of 500 |
10 randomly selected batches of 50 |
simple random sample of 500 |
---|---|---|---|

10 in every precinct | 100% | 100% | 97.2% |

10 in 98 precincts, 20 in 1 precinct | 99% | ≈100% | 97.2% |

20 in 50 precincts | 50% | 62.4% | 97.2% |

250 in 4 precincts | 4% | 5.7% | 97.2% |

500 in 2 precincts | 2% | 18.4% | 97.2% |

Phrased differently, the "confidence" (a slight abuse of the statistical term) that no more than 2% of the 50,000 ballots were misinterpreted if 1% the ballots in the the sample were misinterpreted is as follows:

sampling method | randomly selected precinct of 500 |
10 randomly selected batches of 50 |
simple random sample of 500 |
---|---|---|---|

"confidence" | 2% | 18.4% | 97.2% |

As before, even though 500 randomly selected ballots are counted by hand in every case, the probabilities vary widely. In the case most favorable to precinct-based sampling, hand counting a single randomly selected precinct is guaranteed to reveal that at least 1% of the ballots were misinterpreted (in fact, it will show that 2% were). But the chance falls quickly as the misinterpreted ballots are concentrated into fewer precincts. In the case least favorable to precinct-based sampling, the chance is only 2% for a randomly selected precinct and 18.4% for 10 randomly selected batches of 50—but remains 97.2% for simple random sampling. Using smaller batches increases the chance that the percentage of misinterpreted ballots in the sample will be close to the percentage of misinterpreted ballots in the contest as a whole. Smaller batches yield more reliable estimates.

We start with some statistical terminology.
There is a *population* we want to study.
*Population* is a term of art.
It need not consist of people.
In election auditing, the population is the collection of ballots cast in the contest
(or the auditable records corresponding to the ballots, such as VVPATs).
In the food examples above, one population is 1,200 ounces of soup stock;
the other is 25lbs of jelly beans.

For illustration, suppose that in the contest in question, 500 ballots were cast in each of 100 precincts, 50,000 ballots in all. The population will consist of these 50,000 ballots.

There is some property of the population we are interested in.
That property is called a *parameter*.
In the case of election auditing, the parameter is the net inflation of the apparent margin
compared to the margin a full hand count would show.
In the food examples, one parameter is the total number of coconut jelly beans;
the other is the total amount of salt in the stock.

We want to learn about the parameter without examining every member of the population.
Instead, we will look at a subset of the population, called a *sample*.
Samples can be drawn in countless ways.
We will consider two: *simple random samples* and *random cluster samples*.

A simple random sample is one in which every subset (of a predetermined size) of the population is equally likely to be drawn. Simple random samples are drawn "without replacement." For instance, a simple random sample of 4oz of jelly beans is one in which every 4oz is equally likely to be drawn. Such a sample can be drawn by mixing the jelly beans together really well, then reaching in without looking and scooping out 4oz. A simple random sample of one tablespoon (0.5 ounces) of soup stock can be drawn by putting all the stock in a big cauldron, stirring it well, then dipping in a tablespoon.

A simple random sample of 500 ballots can be drawn from a set of 50,000 ballots putting the ballots in a huge basket, stirring them really well, and drawing 500 without looking. (That turns out to be a terrible way to try to draw a simple random sample, because it's really hard to stir ballots. A better way is to put the ballots in some order, make a list of 50,000 random numbers, and take the sample to be the ballots corresponding to the 500 largest random numbers. For instance, if the 17th random number is the biggest, the 17th ballot would be in the sample.)

A cluster sample is one in which the population is partitioned into non-overlapping groups, called clusters; then a cluster (or a predetermined number of clusters) is drawn at random. A cluster sample of 4oz of jelly beans can be drawn by dividing the beans into 100 4oz bags, then picking one or more of those bags at random. A cluster sample of 12oz of soup stock can be drawn by dividing the soup into 100 12oz cans, then picking one or more cans at random. A cluster sample of 500 ballots could be drawn by picking one of the 100 precincts at random, and taking the sample to be the 500 ballots cast in that precinct. A simple random sample is the same as a cluster sample using clusters of size one.

If we estimate the total salt by stirring all the stock together, drawing a tablespoon
at random, and multiplying the salt in the tablespoon by 2,400, that estimate is
likely to be off by some amount.
If we estimate the total salt in the stock by selecting a can at random and multiplying the
amount of salt in the can by 100, the estimate is also likely to be off by some amount.
The amount by which the estimate is off is called *sampling error*.
The sampling error will tend to be much smaller in the first case, where the cans are mixed together
before the sample is drawn, even though the sample is much smaller (there is 1/24 as much stock
in a 0.5oz tablespoon than in a 12oz can).

Similarly, if we estimate the net inflation of the margin in the contest to be 100 times the net inflation of the margin in a sample of 500 ballots, that estimate is likely to be off by some amount, the sampling error, owing to the luck of the draw. The sampling error will tend to be smaller if the 500 ballots are a simple random sample than if they are a cluster sample consisting of all the ballots in a single precinct selected at random.

Sampling error tends to be smaller on average for simple random samples than for cluster samples. (There are exceptions, depending on how the clusters are formed. If the clusters are themselves random samples from the population and a single cluster is drawn, there's no difference between a cluster sample and a simple random sample. If the clusters are constructed so that they exactly match the population, the sampling error will be smaller for a cluster sample than for a simple random sample.) So, for instance, 100 times the net inflation of the margin in a simple random sample of 500 ballots typically tends to be closer to the total net inflation of the margin for the contest than 100 times the net inflation in a cluster sample of 500 ballots will tend to be.

In estimating parameters from samples, in addition to sampling error there is generally
*statistical bias*, also called *systematic error* or
*non-sampling error*.
In the examples given here, the statistical bias is zero.
That is because in these examples, every member of the population has the
same chance of being selected for the sample: the *expected value* of the sample mean
is then the population mean (see
SticiGui
for more explanation).
If we repeated the procedure over and over, selecting one can or one tablespoon of stock
at random, determining the amount of salt in that sample, and multiplying the
result by 100 or 2,400, the average of those results would tend to
get closer and closer to the total amount of salt in the 100 cans.
However, the individual estimates based on the tablespoon drawn from well stirred stock
would tend to be much closer to the truth than the individual estimates based on drawing
a can at random would tend to be.

The same is true for election auditing: Drawing a simple random sample of 500 individual ballots and multiplying the net inflation of the margin in that sample by 100 will give a number that tends to be much closer to the net inflation of the margin in the whole contest than drawing a precinct of 500 ballots and multiplying the net inflation of the margin in that precinct by 100.

These examples contrast a cluster sample with a simple random sample, which is the extreme case of a cluster sample: clusters of size one. Intermediate cluster sizes give results with intermediate reliability. The inefficiency of cluster samples comes from the fact that the random sampling doesn't "mix" across the boundaries of clusters. The smaller the clusters are, the less that matters.

The smaller the clusters, the closer a cluster sample is to a simple random sample containing the same fraction of the population. For instance, imagine dividing the 50,000 ballots into 1,000 clusters of 50 ballots each instead of 100 clusters of 500 ballots each. Suppose 10 50-ballot clusters were drawn at random, and the net inflation of the margin for those 500 ballots were multiplied by 100. The result would be a more reliable estimate of the total net inflation of the margin in the contest than we would get from a single cluster of 500 ballots. But it still would be a less reliable estimate than we would get from a simple random sample of 500 ballots.

Reducing cluster size gives more information about the difference between the apparent margin and the margin a full hand count would show, for the same counting effort.

To my knowledge, there has been only one risk-limiting audit using clusters of size one, that is, a single-ballot risk-limiting audit. It was conducted in Yolo County, California, in November 2009. See Stark, P.B., 2009. Efficient post-election audits of multiple contests: 2009 California tests. Refereed paper presented at the 2009 Conference on Empirical Legal Studies. (preprint: http://ssrn.com/abstract=1443314) The biggest obstacle to conducting single-ballot audits or small-batch audits is the design of current commercial vote tabulation systems. They do not provide a record of the machine interpretation of individual ballots or small batches of ballots suitable for auditing. If vote tabulation systems were designed with single-ballot or small-batch auditing in mind, the cost savings to jurisdictions that wish to perform risk-limiting audits would be enormous.

With single-ballot audits, special precautions need to be taken to ensure voter privacy and discourage the buying and selling of votes. There are a many ways this can be accomplished. For instance, rather than publish the interpretation of individual ballots, the interpretation could be committed to by transmitting a digitally signed file to the Secretary of State; only precinct-level subtotals would be reported to the public. Alternatively, cryptographic commitments might be used. Or the reporting system could dissociate votes in different contests on the same ballot, giving each (contest, ballot) pair a randomly generated but unique identifier, so that the physical ballot can be retrieved and checked, but no one without access to the physical ballots would know the pattern of votes on an individual ballot.

**Acknowledgments.**
I am grateful to Mark Lindeman, Joseph Lorenzo Hall, Mike Higgins, and John McCarthy
for encouragement and helpful comments.

© 2010 P.B. Stark. Last modified 27 March 2010.

http://statistics.berkeley.edu/Preprints/smallBatchHeuristics10.htm