Tools for Comparison Risk-Limiting Election Audits

This page implements some tools to conduct "comparison" risk-limiting audits as described in A Gentle Introduction to Risk-Limiting Audits (AGI), by Lindeman and Stark. For an implementation of tools for "ballot-polling" risk-limiting audits as described in AGI, see http://statistics.berkeley.edu/~stark/Vote/ballotPollTools.htm.

To hide or show everything but the tools, click this link.

A risk-limiting audit is a procedure that is guaranteed to have a large chance of progressing to a full hand count of the votes if the electoral outcome is wrong. The outcome according to the hand count then replaces the outcome being audited. The risk limit is the maximum chance that the audit will not progress to a full hand count if the electoral outcome is incorrect, no matter why it is incorrect—whether because of voter error, bugs, pollworker error, or deliberate fraud—provided the audit trail is complete and accurate.

There are many methods for conducting risk-limiting audits. This page performs calculations for a particularly simple method described in AGI. The method is a type of comparison audit: It involves comparing the interpretation of ballots according to the voting system (the cast vote record or CVR) to a human interpretation of the same ballot. Differences between the two interpretations are noted. Determining whether the audit can stop depends on the number and nature of those differences, the number of ballots examined so far, the risk limit, and the diluted margin. The smaller the risk limit or the diluted margin, the larger the number of ballots that must be audited, all else equal.

The difference can be neutral, an understatement, or an overstatement, depending on the effect of changing the voting system interpretation of the ballot to match the hand interpretation: Consider the pairwise margin between each winner and each loser in a contest. For instance, a city council election might involve voting for three candidates from a pool of ten, to fill three seats on the council. Then each of the three winners can be paired with each of the seven losers, giving twenty-one pairwise margins in that contest. If changing the interpretation of a ballot according to the voting system to make it match the human interpretation of the ballot would widen every pairwise margin in every contest under audit, that ballot has an understatement. Understatements do not call the outcome into question. If changing the interpretation according to the voting system to match the human interpretation would narrow any pairwise margin in any contest under audit, the ballot has an overstatement. If enough ballots have overstatements, the outcome could be wrong.

The sample size calculations for this method depend on the risk limit as well as the diluted margin, which is the margin in votes divided by the number of ballots cast in any of the contests being audited together, including undervotes and overvotes. Undervotes and overvotes are included because they might have been intended as votes for candidates, misinterpreted by the voting system as undervotes or overvotes. Because the

Efficient risk-limiting audits generally count votes by hand until there is convincing evidence that the outcome according to a full hand count would agree with the outcome under audit. If convincing evidence is not forthcoming, the audit progresses to a full hand count, which is used to correct the outcome under audit if the two disagree.

• Choose the number of ballots to audit initially
• Select a random sample of ballots
• Find those ballots using a ballot manifest
• Determine whether the audit can stop, given the differences between the machine and human interpretations of the ballots in the sample, and if not, estimate how many additional ballots will need to be audited

Visualizing the required sample size

The ultimate sample size required to confirm the outcome depends on the diluted margin and the number of errors (both understatements and overstatements) found in the sample, as well as the risk limit. The following graph plots the sample size as a function of the number of 1-vote overstatement errors the audit finds, for diluted margins of 0.5%, 1%, 5%, 10%, and 20%, all at risk limit 5%. It also plots the expected final sample size as a function of the diluted margin, for various rates of observed 1-vote overstatements. The plot assumes that there are no 2-vote overstatements and no understatements.

Expected sample size as a function of the diluted margin, 5% risk limit

Plot final sample size as a function of observed 1-vote overstatements

Initial sample size

The initial sample size tool lets you enter the particulars of the contest(s) to be audited as a group: the total ballots cast across all the contests combined, and the vote totals for each candidate in each contest. The form helps you anticipate the number of randomly selected ballots that will need to be compared to their CVRs to attain a given limit on the risk, under assumptions about the rates of differences anticipated. It is completely legitimate to sample one at a time and check whether enough have been sampled using the "stopping sample size tool," (later in this page) but this form can help auditors anticipate how much work will be required and retrieve ballots more efficiently, by reducing the number of times a given batch of ballots is opened.

Enter the total number of ballots cast in all contests to be audited. Add candidates and contests as necessary until the results from all contests have been entered. Enter the desired risk limit and the expected rates of one-vote and two-vote differences. Select whether to round up the expected number of differences of each type. Finally, click "calculate" to find the starting sample size.

Initial sample size
Contest information    Smallest margin (votes): undefined. Diluted margin: undefined.

Audit parameters
Expected rates of differences (as decimal numbers):
Overstatements.
Understatements.
Starting size

By default, this form assumes that the rate of one-vote understatements and overstatements is one in a thousand (0.001) and that the rate of two-vote understatements and overstatements is one in ten thousand (0.0001). These values are conservative, in my experience, but the choice is up to the user. The larger these rates are assumed to be, the larger the initial sample size will be. Taking a larger initial sample can avoid needing to expand the sample later, depending on the rate of errors the audit actually finds. Avoiding "escalation" can make the audit less complicated.

Considerations for deciding which contests to audit together

The number of ballots the audit must examine before stopping depends on the smallest diluted margin among the contests to be audited together (as well as the risk limit, the errors the audit finds, and so on). All else equal, the larger the diluted margin is, the smaller the sample size needs to be.

Because the diluted margin is the smallest margin in votes divided by the total number of ballots cast in all the contests under audit, auditing small contests together with large contests can be inefficient: Dividing the vote margin in small contests by the number of ballots cast in large contests can make the diluted margin very small, which makes the required sample very large.

Generally, if two contests overlap substantially—for instance, if both are jurisdiction-wide contests—it is more economical to audit them together: Fewer ballots will need to be inspected in all. Conversely, if two contests do not overlap at all, it is more efficient to audit them separately.

Auditing small contests together with (overlapping) large contests generally is not efficient unless the vote margin in the small contests is a substantial fraction of the vote margin in the large contests. That is, auditing small contests that have large percentage margins together with large contests that have small percentage margins can be efficient, but auditing small contests together with large contests that have comparable vote margins generally is not efficient, because it makes the diluted margin of the combination much smaller.

The tool above can be used to explore whether it makes sense to audit a collection of contests together by checking whether the required starting sample size when they are audited together is greater than the sum of the required starting sample sizes when they are audited separately. (If you experiment with different groupings of contests, be sure to change the entry for "Ballots cast in all contests" to reflect only the contests that are to be audited together.)

Show technical notes.

The initial sample size form implements this formula from AGI:

n0 = -2g log(a)/((m + 2g(r1log(1-1/(2g)) + r2log(1 - 1/g) + s1log(1+1/(2g)) + s2log(1+1/g))))

with m equal to the diluted margin, a equal to the risk limit, g = 1.03905, r1 the expected rate of 1-vote overstatements per ballot, r2 the expected rate of 2-vote overstatements per ballot, s1 the expected rate of 1-vote understatements per ballot, and s2 the expected rate of 2-vote understatements per ballot. The diluted margin is the smallest margin in votes, divided by the total number of ballots cast, including undervoted and overvoted ballots.

The number n0 is then adjusted to take into account the fact that differences must be round numbers, as follows: The expected number of differences in the sample of each type is n0 times the expected rate of those differences. Depending on which checkboxes are checked, the expected numbers are either rounded to the nearest whole number, or rounded up. Then those numbers of discrenpancies are plugged into the stopping rule described below, to determine how many ballots would have to be audited if the estimated number of differences of each type were to be observed in the sample. That number is then used once again to estimate the number of differences of each type the sample would contain; the results are rounded to the nearest integer and plugged into the stopping rule a second time. The result is then the starting sample size.

Random sampling

The next tool helps generate a pseudo-random sample of ballots. To start, input a random seed with at least 20 digits (generated by rolling a 10-sided die, for instance), the number of ballots from which you want a sample, and the number of ballots you want in the sample. Further below, there is a form to help find the individual, randomly selected ballots among the batches in which ballots are stored.

Pseudo-Random Sample of Ballots

The "seed," concatenated with a comma and the "Sample number," is passed through the SHA-256 hash function. The result is displayed as "Hashed value (for testing)". The hashed value, interpreted as a hexadecimal number, is divided by "Number of objects from which to sample." One is added to the remainder of that division to get "Randomly selected item," which will be a number between 1 and "Number of objects from which to sample," inclusive. Clicking "draw sample" successively adds one to "Sample number" and recomputes "Hashed value" and "Randomly selected item" "Draw this many objects" times. Selected items accumulate in "Ballots selected" (and "Ballots selected, sorted"), which reset if the seed or the number of objects changes. The same ballot might be selected more than once. Duplicates are removed in "Ballots selected, duplicates removed." Ballots selected more than once, and the frequencies of those ballots, are in "Repeated ballots." Clicking the "reset" button clears the history but leaves the seed.

I learned about this method of generating pseudo-random numbers from Ronald L. Rivest; it is related to a method described in http://tools.ietf.org/html/rfc3797. The SHA-256 hash algorithm produces hash values that are hard to predict from the input. They are also roughly equidistributed as the input varies. The advantages of this approach for election auditing and some other applications include the following:

• The SH-256 algorithm is public and many implementations are available in many languages. (The Javascript implementation used by this page was written by Brian Turek; The JavaScript routines for arithmetic long integers—the SHA-256 hashed values—were written by Leemon Baird).
• Given the seed, anyone can verify that the sequence of numbers generated was correct—that it indeed comes from applying SHA-256.
• Unless the seed is known, the sequence of values generated is unpredictable (so the result is hard to "game"). It is very hard to distinguish the output from independent, uniformly distributed samples.

For comparison, a reference implementation of this approach in Python written by Ronald L. Rivest is available at http://people.csail.mit.edu/rivest/sampler.py.

For more detail, see http://statistics.berkeley.edu/~stark/Java/Html/sha256Rand.htm.

Find ballots using a ballot manifest

Generally, ballots will be stored in batches, for instance, separated by precinct and mode of voting. To make it easier to find individual ballots, it helps to have a ballot manifest that describes how the ballots are stored. For instance, we might have 1,000 ballots stored as follows:

Batch labelballots
Polling place precinct 1 130
Vote by mail precinct 1 172
Polling place precinct 2 112
Vote by mail precinct 2 201
Polling place precinct 3 197
Vote by mail precinct 3 188

If ballot 500 is selected for audit, which ballot is that? If we take the listing of batches in the order given by the manifest, and we require that within each batch, the ballots are in an order that does not change during the audit, then the 500th ballot is the 86th ballot among the vote by mail ballots for precinct 2: The first three batches have a total of 130+172+112 = 414 ballots. The first ballot in the fourth batch is ballot 415. Ballot 500 is the 86th ballot in the fourth batch.

The ballot look-up tool transforms a list of ballot numbers and a ballot manifest into a list of ballots in each batch. Batch labels should not contain commas. Use a comma to separate each batch label from the number of ballots in that batch (or the range of ballot numbers or the set of ballot identifiers—see below). The manifest should have one line per batch and no empty lines.

For instance, to input the ballot manifest above, you would enter:

```Polling place precinct 1, 130
Vote by mail precinct 1, 172
Polling place precinct 2, 112
Vote by mail precinct 2, 201
Polling place precinct 3, 197
Vote by mail precinct 3, 188
```

Some jurisdictions number the ballots cast in an election. If all the ballots in an election are numbered sequentially, the numbers on the ballots that contain a particular contest might not be sequential. For instance, an election might cover precincts 1, 2, and 3, but only voters in precincts 1 and 3 are eligible to vote in the contests to be audited with the current sample. In the previous example, suppose that the jurisdiction had stamped numbers on all the ballots, sequentially, so that the ballots from the polling place in precinct 1 were numbered 1 to 130, the vote by mail ballots from precinct 1 were numbered 131 to 302, the ballots from the polling place in precinct 2 were numbered 303 to 414, and so on, as summarized in the following table:

Batch labelballot range
Polling place precinct 1 1 to 130
Vote by mail precinct 1 131 to 302
Polling place precinct 2 303 to 414
Vote by mail precinct 2 415 to 615
Polling place precinct 3 616 to 812
Vote by mail precinct 3 813 to 994
Provisional ballots for precinct 1 996, 998, 1000
Provisional ballots for precinct 2 997
Provisional ballots for precinct 3 995, 999

Since the ballots already have numbers on them, it makes sense to look them up using those numbers. If we were auditing a collection of contests that included only precincts 1 and 3, the ballots subject to audit would be the 686 ballots labeled 1 to 130, 131 to 302, 616 to 812, and 813 to 994, and 995, 996, 998, and 1000. In this case, the ballot manifest would include only the six batches that comprise precincts 1 and 3, not all eight batches; there are only 686 ballots in these batches. Each line in the manifest would consist of a batch label and a range of ballot numbers, where the range is denoted by a colon, or of a batch label and a set of ballot identifiers in parentheses, separated by spaces. Ballot ranges cannot have gaps: There can be no missing numbers within the range for any single batch. (If there is in fact a gap, input the numbers as a set of identifiers, rather than as a range.) Again, separate the label from the range or set of ballot numbers by a comma. The label must not contain any commas, and the range of ballot numbers or set of identifiers must not contain commas. In this example, we would enter the ballot manifest as follows:

```Polling place precinct 1, 1:130
Vote by mail precinct 1, 131:302
Polling place precinct 3, 616:812
Vote by mail precinct 3, 813:994
Provisional precinct 1, (996 998 1000)
Provisional precinct 3, (995 999)
```

The total number of ballots in the manifest must equal the number cast in the contests that are to be audited together using the sample (686 in this example).

Ballot look-up tool

Should more ballots be audited?

The stopping sample size tool determines whether enough ballots have been examined for the audit to stop, and if not, estimates how many more ballots will need to be audited. The answer depends on the risk limit, the margin, and the differences between the cast vote records and the manual inspection of the ballots in the sample.

Differences matter according to how they affect the pairwise margin between some winner and some loser in some contest. Suppose we are auditing a mayoral contest with four candidates, a city council contest that allows voting for up to three of ten candidates, and a simple measure that involves voting either "yes" or "no." The mayoral contest has three pairwise margins: The winner can be paired with each of the three losers. The city council contest has twenty-one pairwise margins: each of the three winners can be paired with each of the seven losers. The measure has but one pairwise margin, since it has only one winner and one loser. In all, there are 3+21+1 = 25 pairwise margins among the three contests being audited.

If there is any difference between the cast vote record and the human interpretation of a ballot, that ballot as a whole may have an understatement of one or two votes, or an overstatement of one or two votes. No matter how many contests on the ballot have differences and no matter how many candidates in those contests have differences, the ballot as a whole has an understatement of one or two votes, or an overstatement of one or two votes, or neither an understatement nor an overstatement. (Of course, the sample might contain many ballots in each of these categories.)

If changing the interpretation of the ballot according to the voting system to make it match the human interpretation of the ballot would widen every pairwise margin in every contest under audit, that ballot has an understatement. If it would widen every pairwise margin in every contest by two votes, the ballot has a two-vote understatement; otherwise it has a one-vote understatement. If the ballot does not contain every contest under audit, it cannot have an understatement. Since there is an understatement only if changing the machine interpretation of the ballot to match the hand interpretation would increase every pairwise margin, understatements are quite rare. Understatements do not call the outcome into question, so they do not increase the sample size required to confirm the outcome.

If changing the interpretation of the ballot according to the voting system to match the human interpretation of the ballot would narrow any pairwise margin in any contest under audit, that ballot has an overstatement. If changing the interpretation of the ballot according to the voting system to match the human interpretation of the ballot would narrow any pairwise margin in any contest under audit by two votes, that ballot has a two-vote overstatement. No matter how many margins would be narrowed by one or two votes, the overstatement on a ballot is at most two votes, because only the maximum overstatement enters the calculations. If enough ballots have overstatements, the outcome could be wrong, so overstatements increase the sample size required to confirm the outcome.

As an example, suppose that we are auditing five contests simultaneously. Tables 1 and 2 below show two hypothetical CVRs and manual interpretations of the same ballots.

contest 1contest 2 contest 3contest 4contest 5
CVR undervote winner loser not on ballot winner
Hand loser loser winner loser not on ballot
discrepancy 1 over 2 over 2 under** 1 over 1 over
Table 1: Hypothetical CVR and hand interpretation of a ballot that contains four of five contests under audit. Overall, the ballot has an overstatement of 2 votes, because that is the largest overstatement of any margin in any of the contests.

**Contest 3 has an understatement of 2 votes only if the contest has only two candidates. If there are two or more losers in the contest (and only one winner), this contest has an understatement of only one vote, because only one pairwise margin was understated by two votes; the others were overstated by one vote. Similarly, if there are two or more winners in the contest and only one loser, this contest has an understatement of only one vote. If there are at least two winners and at least two losers, there is no understatement in this contest, because at least one pairwise margin was not affected at all by the discrepancy. Regardless, the ballot has an overstatement of 2 votes, because the ballot has an overstatement of 2 votes in contest 2.

contest 1contest 2 contest 3contest 4contest 5
CVR winner winner undervote not on ballot winner
Hand overvote undervote loser loser not on ballot
discrepancy 1 over 1 over 1 over 1 over 1 over
Table 2: Hypothetical CVR and hand interpretation of a ballot that contains four of five contests under audit. Overall, the ballot has an overstatement of 1 vote, because that is the largest overstatement of any margin in any of the contests.

To determine whether the audit can stop, enter the number of ballots in the sample with overstatements or understatements of one or two votes, then click "Calculate." If the sample size is not large enough to confirm the outcome based on the number of differences of each type observed, the value of "If no more discrepancies are observed" will be larger than the current sample size, and the value of "Estimated additional ballots if difference rate stays the same" will be greater than zero. That value is the estimated number of additional ballots that will need to be audited to confirm the outcome at the desired risk limit, assuming that the rate of one and two-vote understatements and overstatements does not change as the sample expands.

Stopping sample size and escalation
Ballots audited so far: 0
Rate:
Rate:
Rate:
Rate:
Estimated stopping size  Audit incomplete
If no more differences are observed: …
If differences continue at the same rate: … .
Estimated additional ballots if difference rate stays the same: …

If the contest being audited has more than two candidates or positions, the calculation above can be very conservative if overstatements do not affect the margin between the winner with the fewest votes and the loser with the most votes. The formula above can be modified to take that into account.

The stopping rule implements the following formula from AGI:

stopping sample size = -2g(log(a) + o1log(1-1/(2g)) + o2log(1 - 1/g) + u1log(1+1/(2g)) + u2log(1+1/g)) / m)

with m equal to the diluted margin, a equal to the risk limit, o1 the number of 1-vote overstatements in the sample, o2 the number of 2-vote overstatements in the sample, u1 the number of 1-vote understatements in the sample, and u2 the number of 2-vote understatements in the sample. In the tool below, g = 1.03905, but any value greater than one can be used. For g = 1.03905, a two-vote overstatement increases the sample size by five times as much as a one-vote overstatement.

The estimates based on differences continuing to occur at the observed rate are based on the method described above for estimating the initial sample size, including the method of rounding the expected number of differences of each type.