class: blueBack ## Classical Statistics in Modern Elections ### Philip B. Stark #### Department of Statistics, University of California, Berkeley #### http://www.stat.berkeley.edu/~stark @philipbstark ### .white[Conference in Honor of Prof. Yoav Benjamini's 70th Birthday] ### Jerusalem, Israel, 17--20 December 2018 Collaborators: Jorge Bañuelos, Josh Benaloh, Mike Byrne, Steve Evans, Alex Halderman, Mike Higgins, Doug Jones, Phil Kortum, Eric Lazarus, Mark Lindeman, Neal McBurnett, Katie McLaughlin, Luke Miratrix, Kellie Ottoboni, Olivier Pereira, Ron Rivest, Peter Ryan, Vanessa Teague, Poorvi Vora, Dan Wallach, Vincent Yates, & many election officials ---  --- .center[
] ---  ---  ---
---
---
--- https://www.intelligence.senate.gov/sites/default/files/documents/ICA_2017_01.pdf  --- .large.center[Trump: 304 electoral votes] -- .large.center.blue[270 to win] -- .center.large.red[304-270 = 35] -- |State | Margin (votes) | margin (%) | paper? | electoral votes | |---------|-------------|------------|--------|-----------------| | Florida | 112,911 | 1.20% | mixed | 29 | | Michigan | 10,704 | 0.23% | yes | 16 | | Pennsylvania | 44,292 | 0.72% | mostly no | 20 | | Wisconsin | 22,748 | 0.77% | yes | 10 | -- .center.red[Errors in a few dozen precincts in MI and PA could alter outcome] --- + Candidate who challenges results likely to be characterized as "sore loser" - pressure to concede on election night, before large fraction of votes have been counted + No legal basis to call for an audit if the state doesn't have audit law + Hoped to put pressure on Governors, State & Local election officials: petition with >450k signatures --- ## Stein's recount efforts: PA, MI, WI -- ### PA + Not much paper to recount: none in Philadelphia, Pittsburgh + Required affidavits from 3 people who voted _in each precinct_ + Huge bond required + Tried, but judge rejected before it really started ---
--- ### WI + Stein had standing to request recount + Recount law allows re-scanning or manual tally - re-scanning is like asking same doctor for 2nd opinion + Stein sued to compel manual tally + Judge agreed manual is preferable, but law says jurisdictions can choose ---
---  ---  ---  ---  ---  ---  ---
---
--- .medium[Georgia, 2018] + Suit against GA SoS Kemp, candidate for Governor in hotly contested race v. Stacey Abrams + Judge Totenberg agreed paper ballots should be used, but was convinced GA could not transition in time + Voting systems and registration system wide open since at least 2016; no action even after vulnerabilities discovered + Anomalous undervote rates in Lt. Gov race + (massive disenfranchisement by Kemp--de-registering voters, rejecting absentee ballots, etc.) ---
--- .medium[Arguments that US elections can't be hacked:] .medium[ - Physical security - Not connected to the Internet - Tested before election day - Too decentralized ] --- .medium[Arguments that US elections can't be hacked:] .medium[ .red[ - Physical security + "sleepovers," unattended equipment in warehouses, school gyms, ... + locks use minibar keys + bad/no seal protocols, easily defeated seals + no routine scrutiny of custody logs, 2-person custody rules, ... ] - Not connected to the Internet - Tested before election day - Too decentralized ] --- .medium[Arguments that US elections can't be hacked:] - .red.strikethrough[Physical security] .red[ - Not connected to the Internet + remote desktop software in tabulation systems + wifi, bluetooth, cellular modems, ... + removable media used to configure equipment & transport results - Zip drives (now only available used) - USB drives. Stuxnet, anyone? + parts from foreign manufacturers; Chinese pop songs in flash + 2007 Ohio EVEREST study; 2007 California Top-to-Bottom Review + 2017 and 2018 DefCon - hardwired passwords such as "password" and "pasta" - _every_ device hacked ] - Tested before election day - Too decentralized --- .medium[Arguments that US elections can't be hacked:] - .red.strikethrough[Physical security] - .red.strikethrough[Not connected to the Internet] .red[ - Tested before election day - Dieselgate, anyone? ] - Too decentralized --- .medium[Arguments that US elections can't be hacked:] - .red.strikethrough[Physical security] - .red.strikethrough[Not connected to the Internet] - .red.strikethrough[Tested before election day] .red[ - Too decentralized - market concentrated: few vendors/models in use - vendors & EAC have been hacked - demonstration viruses that propagate across voting equipment - "mom & pop" contractors program thousands of machines, no IT security - changing presidential race requires changing votes in only a few counties - many weak links ] ---
--- Why not just count by hand? Contests in Berkeley, 11/2018 .small[ 1. Governor 1. Lt. Governor 1. Secretary of State 1. State Controller 1. State Treasurer 1. Attorney General 1. Insurance Commissioner 1. State Board of Equalization 1. U.S. Senator 1. U.S. Representative 1. State Senate 1. State Assembly 1. State Superintendent of Public Instruction 1. Tax Assessor/Collector 1. Community College District Trustee 1. Berkeley School Directors 1. Rent Stabilization Board 1. Alameda County Transit District board 1. Bay Area Rapid Transit District board 1. East Bay Municipal Utilities District board 1. East Bay Regional Park District board 1. Berkeley City Council 1. Berkeley City Auditor 1. Measure G 1. Measure O 1. Measure P 1. Measure Q 1. Measure R 1. Measure FF ] --- .medium[Procedure-based elections] Use certified equipment; follow laws & regulations. --
.medium[Evidence-based elections] Provide convincing evidence that the announced winners really won. --- ## Evidence-Based Elections: 5 C's + _Create_ durable, trustworthy record of voter intent - ideally, hand-marked paper ballots (BMDs for voters who benefit from them) - w BMDs, voter responsible for catching machine errors - if system can mark ballot without voter seeing, not voter-verifiable + _Care_ for the paper record - verifiable chain of custody, 2-person custody rules, ballot accounting, good seal protocols, etc. + _Compliance_ audit: establish whether paper trail is trustworthy - ballot accounting, including VRDB, pollbooks, etc. - check chain of custody logs, video, etc. - eligibility audits + _Check_ reported outcome against the paper + _Correct_ the reported outcome if it is wrong --- ### The Detection Paradigm (<2007)
.framed.blue[Audit a large enough sample (of ballots or precincts) randomly to ensure a large chance of finding at least one tabulation error if the outcome is wrong.] -- But any audit of a substantial number of hand-marked paper ballots likely to find at least one tabulation error. -- Then what? --- ### Risk-Limiting Audits (2007)
.framed.blue[ Large chance of requiring a full hand count of the votes, if that would show the outcome is wrong. (Full hand count corrects wrong outcomes.) ] -- + .medium[Formulate audit as hypothesis test. Null hypothesis: outcome is _wrong_.] -- + .medium[Sequential tests for efficiency when outcome is correct.] -- + .medium[Details tailored to jurisdictional logistics, equipment capability, social choice function, etc.] ---
.medium[ + Endorsed by NASEM, PCEA, ASA, LWV, CC, VV, ... ] --- .medium[Key statistical tools] + Maximizing $P$-values over composite nulls + Sequential tests derived from Kolmogorov's inequality + Fisher's combining function for independent tests (haven't tried Simes) --- .medium[Kolmogorov's Inequality] If $\\{X_j\\}$ is a nonnegative martingale, $$ \Pr \\{ \sup_j X_j \ge a \\} \le \mathbb{E} X_1/a. $$ -- Likelihood ratios are nonnegative martingales with respect to the distribution in the denominator. -- If hypothesis in the denominator of the LR is true, $$ \Pr \\{ LR \ge 1/p \\} \le p $$ Special case of Wald's SPRT. --- Fisher's combining function $\chi(P) \equiv -2 \sum\_{s=1}^S \ln P\_s$. -- If $\\{P\_s \\}\_{s=1}^S$ IID $U[0,1]$, $\chi(P)$ has the chi-square distribution with $2S$ degrees of freedom. -- If $\\{P\_s \\}\_{s=1}^S$ independent w/ $\\Pr \\{P\_s \le p \\} \le p$, $p \in [0, 1]$, then $\chi(P)$ is stochastically smaller than the chi-square distribution with $2S$ degrees of freedom. --- .medium[Simplest RLA: unstratified ballot-polling w/ replacement] Single-winner, vote-for-one plurality contest: $w$ really won if $w$ got more votes than every other candidate. $N$ ballots $N_c$ votes for candidate $c$ $V_c$ reported votes for $c$ -- $ \pi_{cd} \equiv N_c/(N_c+N_d) $ $ s_{cd} \equiv V_c/(V_c+V_d) $ -- $w$ reported winner; $\mathcal{L}$ reported losers -- $w$ really won if $\pi_{w\ell} > 1/2, \;\; \forall \ell \in \mathcal{L} $ -- Draw a ballot at random. $$ \Pr \\{ \mbox{ballot shows vote for } w | \mbox{ballot shows vote for } w \mbox{ or } \ell \\} = \pi_{w\ell} $$ -- Test $\pi_{w\ell} \le 1/2, \ell \in \mathcal{L}$ Alternative $ \\{\pi\_{w\ell} = s_{w\ell} \\}$ -- Audit stops iff all nulls are rejected or there is a full manual tally. --- Ingredients: ballot manifest; risk limit $\alpha$; $N$, $\\{V_c\\}$, $\mathcal{C}$ + $T\_{w\ell} \leftarrow 1, \forall \ell \in \mathcal{L} $; $\mathcal{U} = \mathcal{C}$ -- + Draw ballot at random from manifest; inspect manually - If undervote or invalid ballot, draw again - If ballot cannot be found, $T\_{w\ell} \rightarrow T\_{w\ell}\times 2(1-s_{w\ell}), \forall \ell \in \mathcal{U}$ - If vote for $w$, $T\_{w\ell} \rightarrow T\_{w\ell} \times 2 s\_{w\ell}, \forall \ell \in \mathcal{U}$ - If vote for $\ell$, $T\_{w\ell} \rightarrow T\_{w\ell}\times 2(1- s\_{w\ell}) $ -- + For each $\ell \in \mathcal{U}$, if $T_{w\ell} \ge 1/\alpha$, $ \mathcal{U} \leftarrow \mathcal{U} \setminus \ell $ -- + If $\mathcal{U} = \emptyset$, stop; else, draw another ballot or perform a full manual tally. --- + 255 state-level pres. races, 1992–2012, 10% risk limit - BPA expected to examine **fewer than 308 ballots** for half. -- + 2016 presidential election, 5% risk limit - BPA expected to examine **~700k ballots nationally** ($<0.5$%) --- ### Ballot-polling w/o replacement $H\_\ell: N\_w - N\_\ell \le 0$. Nuisance parameter: $N_u$ votes for other candidates, invalid votes. -- $H\_{\ell k}: N\_w - N\_\ell \le 0, N\_u = k$. -- Then $H\_\ell = \cup\_{0 \le k \le N} H\_{\ell k}$. -- Test each $H_k$ at level $\alpha$; overall test has level no larger than $\alpha$: at most one $H_k$ can be right. $H\_{\ell kd}: N\_w - N\_\ell = d\_k, N\_u = k$. -- Felicitous cancellations in likelihood ratio. Audit stops if likelihood ratio of $H\_{\ell kd}$ to $H\_{\ell k}$ $\ge 1/\alpha$ for all $k$ and $\ell$. --- .medium[Bernoulli sampling] + allows audit to begin in precinct; de-couples work + Horvitz-Thompson estimator and normal approximation not conservative + conditional on attained sample size, can treat as SRS + can treat SRS, randomly permuted, as if it were sequential sample: reduction to SPRT + in practice use geometric skipping + if 2016 U.S. Presidential results are right, BBP w/ rate 1% has 99% chance of confirming outcome in 42 states at 5% risk limit --- ### Comparison audits + export voting system tallies for subtotals + check that subtotals yield reported results + check subtotals at random. Accurate enough to determine who really won? -- Like checking expense report. --- Social choice functions + plurality, vote-for-$k$, majority, super-majority, proportional (incl D'Hondt), IRV, Borda, approval + electoral college, Indian electoral system + multiple contests with a single sample --- Strategy: stop only if there's strong evidence that every winner really beat every loser. -- Reduction: was any _pairwise margin_ overstated by 100% or more? -- Scalarize: maximum across-contest relative overstatement of pairwise margins. $$ e\_b \equiv \max\_c \max\_{w \in \mathcal{W}\_c \ell \in \mathcal{L}\_c} (v\_{bw}-a\_{bw} - v\_{b\ell} + a\_{b\ell})/V\_{w\ell}. $$ All reported outcomes are correct if $$ E \equiv \sum\_{b=1}^B e\_b < 1. $$ A priori upper bound $u\_b$ for $e\_b$ from number of ballots & reported votes in group $b$. Reduces the problem to testing a hypothesis about the mean of a nonnegative population. --- .medium[Dollar unit sampling & taint] $t\_b \equiv \frac{e\_b}{u\_b} \le 1$. Define $U \equiv \sum\_b u\_b$. Draw batches at random with replacement, with probability $u\_b/U$ of drawing batch $b$. Let $T_j$ be the value of $t_b$ for the batch $b$ selected in the $j$th draw. $\\{T\_j\\}\_{j=1}^n$ are IID, $\Pr \\{T\_j \le 1\\} = 1$, and $$ \mathbb{E} T\_1 = \sum\_b \frac{u\_b}{U} t\_b = \frac{1}{U}\sum\_b u\_b \frac{e\_b}{u\_b} = \frac{1}{U} \sum\_b e\_b = E/U. $$ Relates overstatement error to expected value of IID bounded RVs. --- .medium[Testing $\mathbb{E} T \ge 1/U$] Idea in Wald (1945) via H. Kaplan $X_1, X_2, \ldots$ nonnegative, IID $F$. $\mu \equiv \mathbb{E} X\_1 = \int_0^\infty x dF$. -- If $\mu = t$, $$ t^{-1} \mu = t^{-1} \int_0^\infty xdF(x) = \int_0^\infty xt^{-1} dF(x) = 1. $$ -- Fix $\gamma \in [0, 1]$. $$ \mathbb{E} \left ( (\gamma/t) X_j + (1-\gamma) \right ) = (\gamma/t) \mathbb E X_j + (1-\gamma) = (\gamma/t)t + (1-\gamma) = 1. $$ Set $$ dG \equiv (x \gamma/t + (1-\gamma))dF $$ $G$ is the cdf of a nonnegative random variable, and (by Jensen) $$ \int_0^\infty x dG \ge \int_0^\infty x dF.$$ --- $$ \mbox{LR} = \frac{dG(x_j)}{dF(x_j)} = \frac{(x_j\gamma/t+(1−\gamma))dF(x_j)}{dF(x_j)} = x_j\gamma/t+(1−\gamma). $$ Doesn't depend on $F$ or $G$. -- $\mbox{LR}$ for $X\_1, \ldots, X\_m$ is $$ \mbox{LR} = \prod\_{i=1}^m \left [ (\gamma/t) X\_i + 1 - \gamma \right ]. $$ --- Not just LRs; can use other nonnegative martingales. -- $$ \int\_{\gamma=0}^1 \prod\_{i=1}^n \left [ (\gamma/t) X\_i + 1 - \gamma \right ] d\gamma. $$ is also a nonnegative martingale w expected value 1 if $\mu = t$. Resulting test has good power over broader range of alternatives. Polynomial in $\gamma$ of degree at most $n$, with constant term $1$. Term-by-term integration unstable: recursion. --- ### Without replacement Finite population of $N$ non-negative items, $\{x_1, \ldots, x_N\}$ $\mu \equiv \frac{1}{N} \sum_{j=1}^N x_j$. Draw $X\_1, X\_2, \ldots$ sequentially without replacement. Let $S\_j \equiv \sum\_{k=1}^j X_k$, $\tilde{S}\_j \equiv S\_j/N$, and $\tilde{j} \equiv 1 - (j-1)/N$. Define $$ Y\_n \equiv \int\_0^1 \prod\_{j=1}^n \left ( \gamma \left [ X\_j \frac{\tilde{j}}{t - \tilde{S}\_{j-1}} -1 \right ] + 1 \right ) d\gamma. $$ If $\mu=t$, $(Y\_j )\_{j=1}^N$ is a nonnegative closed martingale with expected value 1. By Kolmogorov's inequality, for $J \in \{1, \ldots, N\}$, $$ \Pr \left ( \max\_{1 \le j \le J} Y_j(t) \ge 1/p \right ) \le p. $$ --- .medium[Stratification] + logistically useful to stratify by mode of voting, by county, by when ballots are processed, etc. + current best method based on Fisher's combination -- + $S$ strata. + overstatement of margin of $w$ over $\ell$ in stratum $s$ is $\omega\_{w\ell,s}$ + $w$ really beat $\ell$ if $\sum\_s \omega\_{w\ell,s} < V\_{w\ell}$. + if not, exists $(\lambda\_s)\_{s=1}^S$, with $\sum\_s \lambda\_s = V\_{w\ell}$ and $\omega\_{w\ell,s} \ge \lambda\_s$ for all $s$. + null is union over $\lambda$ of intersection across strata -- + for each $\lambda$, test intersection using Fisher's function to combine stratum-level sequential $p$-values for $\omega\_{w\ell,s} \ge \lambda\_s$ + audit stops if reject hypotheses for all $w \in \mathcal{W}$, $\ell \in \mathcal{L}$, $\lambda$ --- .medium["Super-simple" ballot-level comparison audit] Audit can stop if $$ n \ge \frac{ a + b o_1 + c o_2 - d u_1- e u_2}{m}$$ -- $a, b, c, d$ depend on risk limit & desired operating characteristics $o_1, o_2$ one-vote and two-vote overstatements $u_1, u_2$ one-vote and two-vote understatements $m$ "diluted" margin (includes all ballots) -- Hand arithmetic: transparency --- #### Ballot-Polling v. Ballot-Level Comparison, 2 Candidates, 10% Risk Limit | Winner's share | median BPA | 90th %tile BPA | super-simple | |:---------- | ------: | ------: | -------:| -------:| |70% | 22 | 60 | 8 | |65% | 38 | 108 | 17 | |60% | 84 | 244 | 25 | |58% | 131 | 381 | 31 | |55% | 332 | 974 | 50 | |54% | 518 | 1,520 | 63 | |53% | 914 | 2,700 | 83 | |52% | 2,051 | 6,053 | 125 | |51% | 8,157 | 24,149 | 250 | |50.5% | 32,547| 96,411 | 500 | ---
---
--- .medium[ + ~35 pilot audits in CA, CO, IN, MI, NJ, OH, VA, DK; RI 1/2019 + ballot polling, batch-level comparisons, ballot-level comparison, stratified + pending federal legislation ] --  --- .medium[More wrinkles] Simplicity/transparency/verifiability as design criteria Open-source software Accounting for errors in the ballot manifest: ~2EZ Preserving anonymity when publishing CVRs: SOBA Equipment limitations, transitive audits Mixed-mode audits: SUITE --- ## Requirements .medium[ + paper ballots (25% of US voters don't have) + verifiable chain of custody; compliance audits + ballot manifest (for BBP, list of batches) + good, transparent, verifiable source of randomness - 20 public rolls of translucent 10-sided dice + quality, replicable PRNG - SHA256-based cryptographically secure PRNG, open source $$ X_j = \mathrm{int}(\mathrm{SHA}(\mathrm{seed},j))$$ + escalation rules, coordination for cross-jurisdictional contests, etc. + public verifiability ] ---  ---