How to Tell if an Election Has Been Hacked

class: blueBack

## How to Tell if an Election Has Been Hacked

### Philip B. Stark
#### Department of Statistics, University of California, Berkeley
#### http://www.stat.berkeley.edu/~stark @philipbstark

### .white[Nerd Nite East Bay]
### Oakland, 26 November 2018

---

.center[<iframe width="800" height="450" src="https://www.youtube.com/embed/cruh2p_Wh_4" frameborder="0" allowfullscreen></iframe>]

---

### Arguments that US elections can't be hacked:

.medium[
- Physical security

- Not connected to the Internet

- Tested before election day

- Too decentralized
]

---

### Arguments that US elections can't be hacked:

.medium[
.red[
- Physical security

+ "sleepovers," unattended equipment in warehouses, school gyms, ...

+ locks use minibar keys

+ bad/no seal protocols, easily defeated seals
    
    + no routine scrutiny of custody logs, 2-person custody rules, ...
]

- Not connected to the Internet

- Tested before election day

- Too decentralized
]

---

### Arguments that US elections can't be hacked:

- .red.strikethrough[Physical security]

.red[
- Not connected to the Internet

+ remote desktop software
    
    + wifi, bluetooth, cellular modems, ...

+ removable media used to configure equipment & transport results

- Zip drives (now only available used)
    
        - USB drives. Stuxnet, anyone?
    
    + parts from foreign manufacturers, including China; Chinese pop songs in flash
]

- Tested before election day

- Too decentralized

---

### Arguments that US elections can't be hacked:

- .red.strikethrough[Physical security]

- .red.strikethrough[Not connected to the Internet]

.red[
- Tested before election day

- Dieselgate, anyone?
]

- Too decentralized

---

### Arguments that US elections can't be hacked:

- .red.strikethrough[Physical security]

- .red.strikethrough[Not connected to the Internet]

- .red.strikethrough[Tested before election day]

.red[
- Too decentralized

- market concentrated: few vendors/models in use

- vendors & EAC have been hacked
    
    - demonstration viruses that propagate across voting equipment
    
    - "mom & pop" contractors program thousands of machines, no IT security
    
    - changing presidential race requires changing votes in only a few counties
    
    - many weak links
]

---

## Evidence-Based Elections: 3 C's

.medium[
+ Voters .blue[_CREATE_] complete, durable, verified audit trail.

]
--

.medium[
+ LEO .blue[_CARES FOR_] the audit trail adequately to ensure it remains complete and accuirate.
]

.medium[
+ Verifiable audit .blue[_CHECKS_] reported results against the paper
]

---

---

## Risk-Limiting Audits

.medium[
+ Endorsed by NASEM, PCEA, ASA, LWV, CC, VV, ...
]

.medium[
+ Large chance of requiring a full hand count of the votes, if
that would show the outcome is wrong. (Full hand count corrects wrong outcomes.)
]

.medium[
+ Most efficient options: **ballot-polling** and **ballot-level comparison**
]
    
---
.medium[
+ 255 state-level pres. races, 1992–2012, 10% risk limit

- BPA expected to examine **fewer than 308 ballots** for half.
]

.medium[
+ 2016 presidential election, 5% risk limit

- BPA expected to examine **~700k ballots nationally** (\$<0.5\$%) 
]

---

#### Ballot-Polling v. Ballot-Level Comparison, 2 Candidates, 10% Risk Limit

| Winner's share | median BPA  |  90th %tile BPA | comparison audit |
|:---------- | ------: | ------: | -------:| -------:|
|70%  | 22  | 60   |  8 |
|65%  | 38  | 108    | 17 | 
|60%  | 84  | 244   | 25 | 
|58%  | 131  | 381  | 31 |      
|55%  | 332  | 974  | 50 |  
|54%  | 518  | 1,520 | 63 | 
|53%  | 914  | 2,700 |  83 |    
|52%  | 2,051 | 6,053 | 125  |
|51%  | 8,157 | 24,149 | 250 |
|50.5% | 32,547| 96,411 | 500 |

---

## Risk-Limiting Audits

.medium[
+ ~30 pilot audits in CA, CO, IN, OH, VA, DK; MI 12/2018

+ CO law in effect; RI 2019; CA has pilot laws
]

![CO Attorney General 2018](./Audit16/co-atty-gen-18.png)

---

## RLA pseudo-algorithm

.large[
```
    while (!(full handcount) && !(strong evidence outcome is correct)) {
        audit more
    }
```
]

.large[

```
    if (full handcount) {
        handcount result is final
    }
```
]

.medium[Chance RLA won't correct wrong outcome \$\le \alpha\$, pre-selected **risk limit**.

"Wrong" means full handcount would belie it.
]

---

.medium[
Magic shop claims coin will land heads 2/3 of the time.
]

.medium[Can you test whether chance of heads is > 1/2?]

.blue.medium[HHHHHHHHHH …]

.red.medium[Is that strong evidence that the coin isn't fair?]

.medium[What about

.blue[THHHHHHTTH …]?
]

---

.medium.blue[
+ According to shop:

- \$\Pr(\mathrm{H}) = 2/3\$
    
    - \$\Pr(\mathrm{T}) = 1/3\$
]

.medium.red[
+ If coin is fair:

- \$\Pr(\mathrm{H}) = 1/2\$
    
    - \$\Pr(\mathrm{T}) = 1/2\$
]

---

SPRT

\$t = 1\$

.medium[After each toss, ]

$$ t \leftarrow t \times \frac{\Pr(\mbox{outcome if coin is fair})}{\Pr(\mbox{outcome if shop is right})}$$

--
.medium[
+ If toss lands heads, \$t \leftarrow t \times \frac{1/2}{2/3} = t \times 1/3\$
]

.medium[
+ If toss lands tails, \$t \leftarrow t \times \frac{1/2}{1/3} = t \times 3/2\$
]

.blue.medium[Theorem (Wald, 1945):] If the coin is fair, the chance that \$t\$ ever gets below \$p\$ is at most \$t\$

---

### _P_-values

.blue[HHHHHHHHHH]: \$t=(1/3)^{10} = 0.000017 \$

.blue[THHHHHHTTH]: \$t=(3/2)^3(1/3)^7 = 0.0015 \$

---

## Coins to ballots

.medium[Box model if store is telling the truth:]

```
    |   1    |
    | 0   1  |
    |________|
```
--

.medium[Box model if coin is fair:]

```
    |        |
    | 0   1  |
    |________|
```
---

.medium[Consider names instead of numbers:

+ "1" = vote for the reported winner
+ "0" = vote for the reported loser

]

.medium[If winner really got 2/3 of the votes:]

```
    |      winner     |            |   0         1   |
    | loser   winner  |            |   0         1   |
    |      winner     |            |   0         1   |
    | loser   winner  |     -->    |             1   |
    |      winner     |            |             1   |
    | loser   winner  |            |             1   |
    |_________________|            |_________________|
```

.medium[If outcome was really a tie:]

``` 
    | loser    winner |            |   0        1    |
    | loser    winner |            |   0        1    |
    | loser    winner |    -->     |   0        1    |
    | loser    winner |            |   0        1    |
    |_________________|            |_________________|
```

---

### More than two candidates, undervotes, overvotes, ...

+ .medium[test one reported (winner, loser) pair at a time]

+ .medium[condition on showing a vote for one member of the pair]

+ .medium[no multiplicity issue: conjunction, not union]

---

---

---

---

---

---

---

.blue.medium[Markov's inequality]

.medium[If \$X\$ is a non-negative random variable, then
for all \$a > 0\$,]

$$ \Pr \\{ X \ge a \\} \le \mathbb{E}X/a.$$

$$\mathbb{E}X \ge (1-p) \times 0 + p \times a $$

$$  p \le \mathbb{E}X/a $$

---

.blue.medium[Kolmogorov's inequality]

.medium[If \$\\{X_j \\}\$ is a non-negative Martingale,
then for all \$a > 0\$,]

$$ \Pr \\{ \sup_j X_j \ge a \\} \le \mathbb{E}X_1/a.$$

I.e., Markov's inequality applies simultaneously to every term in a non-negative Martingale.

.medium[Likelihood ratios are non-negative Martingales wrt denominator.]

---
#### "Super-simple" ballot-level comparison audits via Kolmogorov's inequality

$$   n \ge \frac{ a + b o_1 + c o_2 - d u_1- e u_2}{m}$$

.medium[
\$a, b, c, d\$ depend on risk limit & desired operating characteristics

\$o_1, o_2\$ one-vote and two-vote overstatements

\$u_1, u_2\$ one-vote and two-vote understatements

\$m\$ "diluted" margin (includes all ballots)
]

---

## Sampling ballots: requirements

.medium[

+ ballots (25% of US voters don't have)

+ ballot manifest

+ good, transparent, verifiable source of randomness
    - 20 public rolls of transluscent 10-sided dice

+ quality, replicable PRNG
    - SHA256-based cryptographically secure PRNG, open source

$$ X_j = \mathrm{int}(\mathrm{SHA}(\mathrm{seed},j))$$

]

---

![Napa dice](./Audit16/napa-rivest-stark.jpg)