Tools for Comparison Risk-Limiting Election Audits

This page implements some tools to conduct "comparison" risk-limiting audits as described in A Gentle Introduction to Risk-Limiting Audits (AGI), by Lindeman and Stark. It also extends the ballot-level comparison audit method in that paper to cover super-majority contests, such as ballot measures that require at least 2/3 of the valid votes to pass. For an implementation of tools for "ballot-polling" risk-limiting audits as described in AGI, see http://statistics.berkeley.edu/~stark/Vote/ballotPollTools.htm.

To hide or show everything but the tools, click this link.

A risk-limiting audit (RLA) is a procedure that is guaranteed to have a large chance of progressing to a full hand count of the votes if the electoral outcome is wrong. The outcome according to the hand count then replaces the outcome being audited. The risk limit is the maximum chance that the audit will not progress to a full hand count if the electoral outcome is incorrect, no matter why it is incorrect—whether because of voter error, bugs, pollworker error, or deliberate fraud—provided the audit trail is complete and accurate.

The trustworthiness of the paper trail is essential: applying RLA procedures to an untrustworthy paper trail cannot ensure that incorrect electoral outcomes will be corrected, and can in fact turn a correct reported outcome into an incorrect outcome. A compliance audit to establish whether the paper trail is trustworthy is required. Generally, applying RLA procedures to the printout of ballot-marking devices cannot check whether reported election outcomes are correct or correct wrong outcomes, and thus cannot limit the risk that an RLA is supposed to limit. (Nonetheless, there is value in using RLA procedures to check whether tabulation errors were large enough to alter reported outcomes.)

There are many methods for conducting risk-limiting audits. This page performs calculations for a particularly simple method described in AGI. The method is a type of comparison audit: It involves comparing the interpretation of ballots according to the voting system (the cast vote record or CVR) to a human interpretation of the same ballot. Differences between the two interpretations are noted. Determining whether the audit can stop depends on the number and nature of those differences, the number of ballots examined so far, the risk limit, and the diluted margin. The smaller the risk limit or the diluted margin, the larger the number of ballots that must be audited, all else equal.

The difference can be neutral, an understatement, or an overstatement, depending on the effect of changing the voting system interpretation of the ballot card to match the hand interpretation. Understatements make the margin of victory appear smaller than it really was; overstatements make the margin of victory appear larger than it really was.

Consider the pairwise margin between each winner and each loser in a contest. For instance, a city council election might involve voting for three candidates from a pool of ten, to fill three seats on the council. Then each of the three reported winners can be paired with each of the seven reported losers, giving twenty-one pairwise margins in that contest. If changing the interpretation of a ballot card according to the voting system to make it match the human interpretation of the ballot card would widen every pairwise margin in every contest under audit, that ballot card has an understatement. Understatements do not call the outcome into question. If changing the interpretation according to the voting system to match the human interpretation would narrow any pairwise margin in any contest under audit, the ballot has an overstatement. If enough ballot cards have overstatements, the outcome could be wrong.

Quantifying understatements and overstatements is slightly different for plurality contests and super-majority contests; see below.

The sample size calculations for this method depend on the risk limit as well as the diluted margin, which is the margin in votes divided by the number of ballot cards cast in any of the contests being audited together, including undervotes and overvotes and cards that do not contain the contest. Undervotes and overvotes are included because they might have been intended as votes for candidates, misinterpreted by the voting system as undervotes or overvotes. Similarly, the voting system might report that a card contains the contest when it does not, and vice versa.

Efficient risk-limiting audits generally inspect ballot cards by hand until there is convincing evidence that the outcome according to a full hand count would agree with the outcome for the contests under audit. If convincing evidence is not forthcoming, the audit progresses to a full hand count, which is used to correct the outcomes under audit if the hand count shows they are incorrect.

The tools on this page help perform the following steps:

Choose the number of ballots to audit initially
Select a random sample of ballots
Find those ballots using a ballot manifest
Determine whether the audit can stop, given the differences between the machine and human interpretations of the ballots in the sample, and if not, estimate how many additional ballots will need to be audited

Visualizing the required sample size

The ultimate sample size required to confirm the outcome depends on the diluted margin and the number of errors (both understatements and overstatements) found in the sample, the social choice function (plurality or supermajority), as well as the risk limit. The following graph plots the sample size as a function of the number of 1-vote overstatement errors the audit finds, for diluted margins of 0.5%, 1%, 5%, 10%, and 20%, all at risk limit 5%. It also plots the expected final sample size as a function of the diluted margin, for various rates of observed 1-vote overstatements. The plot assumes that there are no 2-vote overstatements and no understatements.

Expected sample size as a function of the diluted margin, 5% risk limit

Plot final sample size as a function of observed 1-vote overstatements

Initial sample size

The initial sample size tool lets you enter the particulars of the contest(s) to be audited as a group: the total ballot cards cast across all the contests combined, and the vote totals for each candidate in each contest. The form helps you anticipate the number of randomly selected ballots that will need to be compared to their CVRs to attain a given limit on the risk, under assumptions about the rates of differences anticipated. It is completely legitimate to sample one ballot card at a time and check whether enough have been sampled using the "stopping sample size tool," (later in this page) but this form can help auditors anticipate how much work will be required and retrieve ballot cards more efficiently, by reducing the number of times a given batch of ballots is opened.

Enter the total number of ballot cards cast in all contests to be audited. Add candidates and contests as necessary until the results from all contests have been entered. Indicate whether the contest is plurality or supermajority. For plurality contests, select the number of winners, e.g., 1 for a first-past-the-post contest. For supermajority contests, input the percentage of votes required to win, e.g., 50% or 66.667% (include the percent sign). This tool allows at most two candidates in supermajority contests, i.e., "yes" and "no." The candidate listed first is assumed to be the candidate that requires the supermajority. For instance, to audit a ballot measure that requires a 2/3 majority to pass, list "yes" as the first candidate and "no" as the second candidate. Enter the desired risk limit and the expected rates of one-vote and two-vote differences. Select whether to round up the expected number of differences of each type. Finally, click "calculate" to find the starting sample size.

Initial sample size

Contest information Ballot cards cast in all contests: Smallest margin (votes): undefined. Diluted margin: undefined.

Audit parameters Risk limit:
Expected rates of differences (as decimal numbers):
Overstatements. 1-vote: 2-vote:
Understatements. 1-vote: 2-vote:

Starting size Round up 1-vote differences. Round up 2-vote differences. …

By default, this form assumes that the rate of one-vote understatements and overstatements is one in a thousand (0.001) and that the rate of two-vote understatements and overstatements is one in ten thousand (0.0001). These values are conservative, in my experience, but the choice is up to the user. The larger these rates are assumed to be, the larger the initial sample size will be. Taking a larger initial sample can avoid needing to expand the sample later, depending on the rate of errors the audit actually finds. Avoiding "escalation" can make the audit less complicated.

Considerations for deciding which contests to audit together

The number of ballot cards the audit must retrieve and inspect before stopping depends on the smallest diluted margin among the contests to be audited together (as well as the risk limit, the errors the audit finds, and so on). All else equal, the larger the diluted margin, the smaller the sample size—when the reported outcome is correct.

Because the diluted margin is the smallest margin in votes divided by the total number of ballot cards cast in all the contests simultaneously under audit, auditing small contests together with large contests can be inefficient if the ballot cards containing those contests can be segregated from each other: Dividing the vote margin in small contests by the number of ballot cards cast in large contests can make the diluted margin very small, which makes the required sample very large. If a jurisdiction organizes ballots by ballot style (e.g., through precinct-based voting or by sorting vote by mail ballots before scanning), it can substantially reduce the work of auditing multiple contests and small contests.

Generally, if two contests overlap substantially—for instance, if both are jurisdiction-wide contests—it is more economical to audit them together: Fewer ballot cards will need to be manually inspected in all. Conversely, if two contests do not overlap at all, it is more efficient to audit them separately, if the ballots are organized physically by ballot style. The use of vote centers tends to make it harder to audit efficiently, unless the ballots or ballot cards are sorted before scanning.

Auditing small contests together with (overlapping) large contests generally is not efficient unless the vote margin in the small contests is a substantial fraction of the vote margin in the large contests. That is, auditing small contests that have large percentage margins together with large contests that have small percentage margins can be efficient, but auditing small contests together with large contests that have comparable vote margins generally is not efficient, because it makes the diluted margin of the combination much smaller.

The tool above can be used to explore whether it makes sense to audit a collection of contests together by checking whether the required starting sample size when they are audited together is greater than the sum of the required starting sample sizes when they are audited separately. (If you experiment with different groupings of contests, be sure to change the entry for "Ballot cards cast in all contests" to reflect only the contests that are to be audited together.)

Show starting sample size technical notes.

For plurality contests, the initial sample size form implements this formula from AGI:

$$ n_0 = -2g \log(a)/((m + 2g(r_1 \log(1-1/(2g)) + r_2 \log(1 - 1/g) + s_1 \log(1+1/(2g)) + s_2 \log(1+1/g)))) $$

with $m$ equal to the diluted margin, $a$ equal to the risk limit, $g = 1.03905$, $r_1$ the expected rate of 1-vote overstatements per ballot, $r_2$ the expected rate of 2-vote overstatements per ballot, $s_1$ the expected rate of 1-vote understatements per ballot, and $s_2$ the expected rate of 2-vote understatements per ballot. The diluted margin is the smallest margin in votes, divided by the total number of ballot cards cast, including undervoted and overvoted ballot cards.

The number n₀ is then adjusted to take into account the fact that differences must be round numbers, as follows: The expected number of differences in the sample of each type is n₀ times the expected rate of those differences. Depending on which checkboxes are checked, the expected numbers are either rounded to the nearest whole number, or rounded up. Then those numbers of discrenpancies are plugged into the stopping rule described below, to determine how many ballot cards would have to be audited if the estimated number of differences of each type were to be observed in the sample. That number is then used once again to estimate the number of differences of each type the sample would contain; the results are rounded to the nearest integer and plugged into the stopping rule a second time. The result is then the starting sample size.

Show explanation of auditing supermajority contests.

For super-majority contests, the calculations are a bit different.

Let's label the two candidates "yes" and "no." "Yes" needs to have a fraction $f$ of the valid votes to win. If "yes" falls short of that, "no" wins. Suppose "yes" was reported to have won:

$ (\mbox{yes votes}) \ge f \times (\mbox{yes votes } + \mbox{ no votes}) $.

Then, $ (\mbox{yes votes}) \ge \frac{f}{1-f} (\mbox{no votes}) $.

The "margin" is $ (\mbox{yes votes}) - \frac{f}{1-f} (\mbox{no votes}) $.

The auditing method that is the basis for the comparison audit method in AGI is derived in Stark, P.B., 2010. Super-Simple Simultaneous Single-Ballot Risk-Limiting Audits, EVT/WOTE https://www.usenix.org/legacy/event/evtwote10/tech/full_papers/Stark.pdf. It tests the hypothesis that the total error in the interpretation of all ballot cards does not exceed the margin, on the assumption that the overstatement error on any particular card cannot be larger than 2. It can thus be used to audit supermajority contests if they can be translated to the same statistical problem. That turns out just to require rescaling the "margin" appropriately and counting discrepancies in a manner consistent with that rescaling.

We first re-scale this margin so that the largest overstatement a CVR can have is 2. The largest possible discrepancy is to tabulate a "no" on the ballot card as a "yes" vote in the CVR, which would alter the margin by $$ 1 + \frac{\mbox{f}}{1 - f} = \frac{1}{1-f} \mbox{ votes }. $$ To make this no larger than 2, we define the equivalent plurality margin to be: $$ m_y \equiv 2 \frac{(\mbox{yes votes}) - \frac{f}{1-f} \times (\mbox{no votes})}{\frac{\mbox{1}}{1 - f}} = 2(1-f) (\mbox{yes votes}) - 2f \times (\mbox{no votes}).$$ In the special case $f = 1/2$, this is the ordinary (plurality) margin in votes.

Conversely, suppose "yes" was reported to have received less than the threshold $f$, it is equivalent to "no" winning a super-majority contest with required winning fraction $f' \equiv 1-f$, so the corresponding equivalent plurality margin is $$ m_n = 2(1-f') (\mbox{no votes}) - 2f' (\mbox{yes votes}) = 2f (\mbox{no votes}) - 2(1-f) (\mbox{yes votes}) = -m_y.$$ Thus the two situations are equivalent up to a sign change. In the special case $f = 1/2$, this is again the ordinary (plurality) margin in votes.

How do we count discrepancies the audit encounters? We just find their effect on $m_y$ if "yes" was the reported winner or on $m_n$ if "no" was the reported winner. Let's start with "yes." If the CVR has a yes vote that the ballot card does not show, that contributes $2(1-f)$ votes to the overstatement for the card; if the ballot card has a yes vote that the CVR does not show, that contributes $2(1-f)$ votes to the understatement for the card. If the ballot card has a no vote that the CVR does not show, that contributes $2f$ to the overstatement for the card; if the CVR has a no vote that the card does not show, that contributes $2f$ to the understatement for the card.

These contributions add. For instance, if the card shows a no vote but the CVR shows a yes vote, the total discrepancy is an overstatement of $2(1-f) + 2f = 2$. Table 1 shows the possibilities. If "no" is the reported winner, interchange the role of understatements and overstatements (equivalently, interchange the role of CVRs and hand interpretations).

Table 1n: How to count discrepancies between hand interpretation and CVR for a super-majority contest when it takes a fraction $f$ of the valid votes to win and "yes" is the reported winner. If "yes" is the reported loser, substitute undercounts for overcounts, and vice versa: the magnitudes are the same but the sign is reversed.
	scenario 1	scenario 2	scenario 3	scenario 4	scenario 5	scenario 6
CVR	yes	yes	undervote	undervote	no	no
Hand	no	undervote	yes	no	yes	undervote
error	2 over	$2(1-f)$ over	$2(1-f)$ under	$2f$ over	2 under	$2f$ under

Suppose $f \ge 1/2$, a "genuine" super-majority contest (there are examples in U.S. primary elections where $f = 15\%$). Then $1 \le 2f \le 2$ and $2(1-f) \le 1$. Rounding down understatement errors and rounding up overstatement errors is conservative. Thus if we count discrepancies as follows, the audit is still risk-limiting:

Table 2n: A conservative way to count discrepancies between hand interpretation and CVR for a super-majority contest when it takes a fraction $f \ge 1/2$ of the valid votes to win and "yes" is the reported winner.
	scenario 1	scenario 2	scenario 3	scenario 4	scenario 5	scenario 6
CVR	yes	yes	undervote	undervote	no	no
Hand	no	undervote	yes	no	yes	undervote
error	2 over	1 over	0	2 over	2 under	1 under

Table 3n: A conservative way to count discrepancies between hand interpretation and CVR for a super-majority contest when it takes a fraction $f \ge 1/2$ of the valid votes to win and "yes" was reported to lose.
	scenario 1	scenario 2	scenario 3	scenario 4	scenario 5	scenario 6
CVR	yes	yes	undervote	undervote	no	no
Hand	no	undervote	yes	no	yes	undervote
error	2 under	0	1 under	1 under	2 over	2 over

Thus the same software can be used to audit super-majority contests. But at the expense of slightly more bookkeeping to keep track of fractional overstatements and understatements as shown in Table 1n, the audit would be sharper. However, the approach in SHANGRLA is sharper still: Stark, P.B., 2020. Sets of Half-Average Nulls Generate Risk-Limiting Audits: SHANGRLA, Voting '20, to appear. Preprint: http://arxiv.org/abs/1911.10035}

Random sampling

The next tool helps generate a pseudo-random sample of ballot cards. To start, input a random seed with at least 20 digits (generated by rolling a 10-sided die, for instance), the number of ballot cards from which you want a sample, and the number of ballot cards you want in the sample. Further below, there is a form to help find the individual, randomly selected ballot cards among the batches in which ballot cards are stored.

Show technical notes.

The "seed," concatenated with a comma and the "Sample number," is passed through the SHA-256 hash function. The result is displayed as "Hashed value (for testing)". The hashed value, interpreted as a hexadecimal number, is divided by "Number of objects from which to sample." One is added to the remainder of that division to get "Randomly selected item," which will be a number between 1 and "Number of objects from which to sample," inclusive. Clicking "draw sample" successively adds one to "Sample number" and recomputes "Hashed value" and "Randomly selected item" "Draw this many objects" times. Selected items accumulate in "Ballot cards selected" (and "Ballot cards selected, sorted"), which reset if the seed or the number of objects changes. The same ballot card might be selected more than once. Duplicates are removed in "Ballot cards selected, duplicates removed." Ballot cards selected more than once, and the frequencies of those ballot cards, are in "Repeated ballot cards." Clicking the "reset" button clears the history but leaves the seed.

I learned about this method of generating pseudo-random numbers from Ronald L. Rivest; it is related to a method described in http://tools.ietf.org/html/rfc3797. The SHA-256 hash algorithm produces hash values that are hard to predict from the input. They are also roughly equidistributed as the input varies. The advantages of this approach for election auditing and some other applications include the following:

The SH-256 algorithm is public and many implementations are available in many languages. (The Javascript implementation used by this page was written by Brian Turek; The JavaScript routines for arithmetic long integers—the SHA-256 hashed values—were written by Leemon Baird).
Given the seed, anyone can verify that the sequence of numbers generated was correct—that it indeed comes from applying SHA-256.
Unless the seed is known, the sequence of values generated is unpredictable (so the result is hard to "game"). It is very hard to distinguish the output from independent, uniformly distributed samples.

For comparison, a reference implementation of this approach in Python written by Ronald L. Rivest is available at http://people.csail.mit.edu/rivest/sampler.py.

For more detail, see http://statistics.berkeley.edu/~stark/Java/Html/sha256Rand.htm.

Find ballot cards using a ballot manifest

Generally, ballot cards are stored in batches, for instance, separated by precinct and mode of voting. To make it easier to retrieve individual ballot cards for auditing, it helps to have a ballot manifest that describes how the ballot cards are stored. For instance, we might have 1,000 ballot cards stored as follows:

Batch label	ballot cards
Polling place precinct 1	130
Vote by mail precinct 1	172
Polling place precinct 2	112
Vote by mail precinct 2	201
Polling place precinct 3	197
Vote by mail precinct 3	188

If ballot card 500 is selected for audit, which card is that? If we take the listing of batches in the order given by the manifest, and we require that within each batch, the ballot cards are in an order that does not change during the audit, then the 500th ballot card is the 86th ballot card among the vote by mail ballot cards for precinct 2: The first three batches have a total of 130+172+112 = 414 ballot cards. The first card in the fourth batch is ballot card 415. Ballot card 500 is the 86th card in the fourth batch.

The ballot look-up tool transforms a list of ballot card numbers and a ballot manifest into a list of ballot cards in each batch. Batch labels should not contain commas. Use a comma to separate each batch label from the number of ballot cards in that batch (or the range of ballot card numbers or the set of ballot card identifiers—see below). The manifest should have one line per batch and no empty lines.

For instance, to input the ballot manifest above, you would enter:

Polling place precinct 1, 130
Vote by mail precinct 1, 172
Polling place precinct 2, 112
Vote by mail precinct 2, 201
Polling place precinct 3, 197
Vote by mail precinct 3, 188

Some jurisdictions number the ballot cards cast in an election. If all the ballot cards in an election are numbered sequentially, the numbers on the ballot cards that contain a particular contest might not be sequential. For instance, an election might cover precincts 1, 2, and 3, but only voters in precincts 1 and 3 are eligible to vote in the contests to be audited with the current sample. If the ballots have more than one card, cards containing a given contest in general will not be numbered sequentially. In the previous example, suppose that the ballot consisted of a single card and that the jurisdiction had stamped numbers on all the cards, sequentially, so that the ballot cards from the polling place in precinct 1 were numbered 1 to 130, the vote by mail ballot cards from precinct 1 were numbered 131 to 302, the ballot cards from the polling place in precinct 2 were numbered 303 to 414, and so on, as summarized in the following table:

Batch label	ballot card range
Polling place precinct 1	1 to 130
Vote by mail precinct 1	131 to 302
Polling place precinct 2	303 to 414
Vote by mail precinct 2	415 to 615
Polling place precinct 3	616 to 812
Vote by mail precinct 3	813 to 994
Provisional ballots for precinct 1	996, 998, 1000
Provisional ballots for precinct 2	997
Provisional ballots for precinct 3	995, 999

Since the ballot cards already have numbers on them, it makes sense to look them up using those numbers. If we were auditing a collection of contests that included only precincts 1 and 3, the ballot cards subject to audit would be the 686 card labeled 1 to 130, 131 to 302, 616 to 812, and 813 to 994, and 995, 996, 998, and 1000. In this case, the ballot manifest would include only the six batches that comprise precincts 1 and 3, not all eight batches; there are only 686 ballot cards in these batches. Each line in the manifest would consist of a batch label and a range of ballot card numbers, where the range is denoted by a colon, or of a batch label and a set of ballot card identifiers in parentheses, separated by spaces. Ballot card ranges cannot have gaps: There can be no missing numbers within the range for any single batch. (If there is in fact a gap, input the numbers as a set of identifiers, rather than as a range.) Again, separate the label from the range or set of ballot card numbers by a comma. The label must not contain any commas, and the range of ballot card numbers or set of identifiers must not contain commas. In this example, we would enter the ballot manifest as follows:

Polling place precinct 1, 1:130
Vote by mail precinct 1, 131:302
Polling place precinct 3, 616:812
Vote by mail precinct 3, 813:994
Provisional precinct 1, (996 998 1000)
Provisional precinct 3, (995 999)

The total number of ballot cards in the manifest must equal the number cast in the contests that are to be audited together using the sample (686 in this example).

Should more ballot cards be audited?

The stopping sample size tool determines whether enough cards have been examined for the audit to stop, and if not, estimates how many more will need to be audited. The answer depends on the risk limit, the margin, and the differences between the cast vote records and the manual inspection of the ballot cards in the sample.

Differences matter according to how they affect the pairwise margin between some winner and some loser in some contest. Suppose we are auditing a mayoral contest with four candidates, a city council contest that allows voting for up to three of ten candidates, and a simple measure (not a supermajority contest) that involves voting either "yes" or "no." The mayoral contest has three pairwise margins: The winner can be paired with each of the three losers. The city council contest has twenty-one pairwise margins: each of the three winners can be paired with each of the seven losers. The measure has but one pairwise margin, since it has only one winner and one loser. In all, there are 3+21+1 = 25 pairwise margins among the three contests being audited.

If there is any difference between the cast vote record and the human interpretation of a ballot, that ballot as a whole may have an understatement of one or two votes, or an overstatement of one or two votes. No matter how many contests on the ballot have differences and no matter how many candidates in those contests have differences, the ballot as a whole has an understatement of one or two votes, or an overstatement of one or two votes, or neither an understatement nor an overstatement. (Of course, the sample might contain many ballots in each of these categories.)

If changing the interpretation of the ballot according to the voting system to make it match the human interpretation of the ballot would widen every pairwise margin in every contest under audit, that ballot has an understatement. If it would widen every pairwise margin in every contest by two votes, the ballot has a two-vote understatement; otherwise it has a one-vote understatement. If the ballot does not contain every contest under audit, it cannot have an understatement. Since there is an understatement only if changing the machine interpretation of the ballot to match the hand interpretation would increase every pairwise margin, understatements are quite rare. Even though understatements indicate that the voting system misinterpreted votes, they do not call the outcome into question, so they do not increase the sample size required to confirm the outcome.

If changing the interpretation of the ballot according to the voting system to match the human interpretation of the ballot would narrow any pairwise margin in any contest under audit, that ballot has an overstatement. If changing the interpretation of the ballot according to the voting system to match the human interpretation of the ballot would narrow any pairwise margin in any contest under audit by two votes, that ballot has a two-vote overstatement. No matter how many margins would be narrowed by one or two votes, the overstatement on a ballot is at most two votes, because only the maximum overstatement enters the calculations. If enough ballots have overstatements, the outcome could be wrong, so overstatements increase the sample size required to confirm the outcome.

As an example, suppose that we are auditing five plurality contests simultaneously. Tables 1 and 2 below show two hypothetical CVRs and manual interpretations of the same ballots.

Table 1: Hypothetical CVR and hand interpretation of a ballot that contains four of five plurality contests under audit. Overall, the ballot has an overstatement of 2 votes, because that is the largest overstatement of any margin in any of the contests.
	contest 1	contest 2	contest 3	contest 4	contest 5
CVR	undervote	winner	loser	not on ballot	winner
Hand	loser	loser	winner	loser	not on ballot
difference	1 over	2 over	2 under^**	1 over	1 over

^**Contest 3 has an understatement of 2 votes only if the contest has only two candidates. If there are two or more losers in the contest (and only one winner), this contest has an understatement of only one vote, because only one pairwise margin was understated by two votes; the others were overstated by one vote. Similarly, if there are two or more winners in the contest and only one loser, this contest has an understatement of only one vote. If there are at least two winners and at least two losers, there is no understatement in this contest, because at least one pairwise margin was not affected by the difference. Regardless, the ballot has an overstatement of 2 votes, because the ballot has an overstatement of 2 votes in contest 2.

Table 2: Hypothetical CVR and hand interpretation of a ballot that contains four of five contests under audit. Overall, the ballot has an overstatement of 1 vote, because that is the largest overstatement of any margin in any of the contests.
	contest 1	contest 2	contest 3	contest 4	contest 5
CVR	winner	winner	undervote	not on ballot	winner
Hand	overvote	undervote	loser	loser	not on ballot
difference	1 over	1 over	1 over	1 over	1 over

Counting discrepancies for super-majority contests is different. The following two tables give a conservative approach.

Table 3: A conservative way to count discrepancies between hand interpretation and CVR for a super-majority contest when it takes a fraction $f$ of the valid votes to win and "yes" is the reported winner.
	scenario 1	scenario 2	scenario 3	scenario 4	scenario 5	scenario 6
CVR	yes	yes	undervote	undervote	no	no
Hand	no	undervote	yes	no	yes	undervote
error	2 over	1 over	0	2 over	2 under	1 under

Table 4: A conservative way to count discrepancies between hand interpretation and CVR for a super-majority contest when it takes a fraction $f$ of the valid votes to win and "yes" was reported to lose.
	scenario 1	scenario 2	scenario 3	scenario 4	scenario 5	scenario 6
CVR	yes	yes	undervote	undervote	no	no
Hand	no	undervote	yes	no	yes	undervote
error	2 under	0	1 under	1 under	2 over	2 over

To determine whether the audit can stop, enter the number of ballot cards in the sample with overstatements or understatements of one or two votes, then click "Calculate." If the sample size is not large enough to confirm the outcome based on the number of differences of each type observed, the value of "If no more differences are observed" will be larger than the current sample size, and the value of "Estimated additional ballot cards if difference rate stays the same" will be greater than zero. That value is the estimated number of additional ballot cards that will need to be audited to confirm the outcome at the desired risk limit, assuming that the rate of one and two-vote understatements and overstatements does not change as the sample expands.

Stopping sample size and escalation

Ballot cards audited so far: 0

1-vote overstatements: Rate:
2-vote overstatements: Rate:
1-vote understatements: Rate:
2-vote understatements: Rate:

Estimated stopping size Audit incomplete
If no more differences are observed: …
If differences continue at the same rate: … .
Estimated additional ballot cards if difference rate stays the same: …

If the contest being audited has more than two candidates or positions, the calculation above can be very conservative if overstatements do not affect the margin between the winner with the fewest votes and the loser with the most votes. The formula above can be modified to take that into account.

Show technical notes.

The stopping rule implements the following formula from AGI:

stopping sample size = -2g(log(a) + o₁log(1-1/(2g)) + o₂log(1 - 1/g) + u₁log(1+1/(2g)) + u₂log(1+1/g)) / m)

with m equal to the diluted margin, a equal to the risk limit, o₁ the number of 1-vote overstatements in the sample, o₂ the number of 2-vote overstatements in the sample, u₁ the number of 1-vote understatements in the sample, and u₂ the number of 2-vote understatements in the sample. In the tool below, g = 1.03905, but any value greater than one can be used. For g = 1.03905, a two-vote overstatement increases the sample size by five times as much as a one-vote overstatement.

The estimates based on differences continuing to occur at the observed rate are based on the method described above for estimating the initial sample size, including the method of rounding the expected number of differences of each type.

P.B. Stark, statistics.berkeley.edu/~stark. http://statistics.berkeley.edu/~stark/Java/Html/auditTools.htm Last modified 9 June 2020.