- Answers will be scored objectively -- no subjective judgments (as would be needed for creative writing, for instance).
- Contestants who end with a better overall score will -- beyond reasonable doubt -- be better at the subject matter of the quiz.
- The questions refer to substantive real-world matters, rather than fantasy (islands with liars and truth-tellers) or self-referential ("how would most other contestants answer this question?").
- No person (or computer, etc) knows or will ever know the correct answer to any of the questions.

- Before 8 September 2018, will the Council of the European Union adopt a directive on taxation of digital business activities?
- Will the WHO declare a Public Health Emergency of International Concern (PHEIC) before 1 September 2018?
- Before 8 September 2018, will Poland, Estonia, Latvia, or Lithuania accuse Russia of intervening militarily in its territory without permission?
- Before 8 September 2018, will India sign either of the two remaining defense foundational agreements with the U.S.?

The point is that no-one will ever know the correct probabilities on a given day. Nevertheless, as outlined below it is possible to objectively measure participants' relative ability to assess such probabilities, after the outcomes are known. So this fits the rules for my puzzle.

In a prediction tournament there will be a large number \(n\) of events, with unknown
probabilities \((q_i, 1 \le i \le n) \) and with forecasts
\((p_{A,i}, p_{B,i}, \ldots, 1 \le i \le n) \) chosen by participants \(A, B, \ldots\).
We **would like to** measure how good a participant is by the average squared-error of their forecast probabilities
\begin{equation}
\mathrm{MSE}(A) = \frac{1}{n} \sum_i (p_{A,i} - q_i)^2 .
\label{def-MSE}
\end{equation}
But this is impossible to know, because we don't know the \(q\)'s, the true probabilities.
However, a little algebra show that for the final scores (the average of the scores on each event)
\begin{eqnarray}
E[\mbox{final score (A)}] - E[\mbox{final score (B)}]
= \mathrm{MSE}(A) - \mathrm{MSE}(B) .
\label{average-3}
\end{eqnarray}
Now your actual final score is random, but by a "law of large numbers" argument,
for a large number of events it will be close to its mean.
Informally,
\begin{eqnarray*}
\mbox{final score (A)}] &=& E[\mbox{final score (A)}]
\pm \mbox{ small random effect} .
\end{eqnarray*}
Putting all this together,
\begin{eqnarray*}
\mathrm{MSE}(A) - \mathrm{MSE}(B) =
\mbox{final score (A)} - \mbox{final score (B)}
\pm \mbox{ small random effect} .
\end{eqnarray*}
Now we are done:
the MSEs are our desired measure of skill, and from the observed final scores we can tell the relative skills of the different participants,
up to a small amount of luck.

- A term \(\frac{1}{n}\sum_i q_i(1-q_i)\) from irreducible randomness. This is the same for everyone, but we don't know the value.
- Your individual MSE, where "error" is (your forecast probability - true probability)
- Your individual luck, from randomness of outcomes.

- The par score.
- The typical amount you score over par (your
*handicap*, in golf language). - Your luck on that round.

of the events estimated as having 60-70% chance, about 60-70% should actually occur(and similarly for other ranges) -- is bad because you can game the system via dishonest announcements.

To me it is self-evident that one should make predictions about uncertain future events in terms of probabilities rather than Yes/No predictions. So it is curious that, outside of gambling-like contexts, this is rarely done. Indeed almost the only everyday context where one sees numerical probabilities expressed is the chance of rain tomorrow. A major inspiration for current interest in this topic has been the work of Philip Tetlock. His 2006 book Expert Political Judgment: How Good Is It? How Can We Know? looks at extensive data on how good geopolitical forecasts from political experts have been in the past (short answer: not very good). That book contains more mathematics along the "how to assess prediction skill" theme of this article.

What makes some people are better than others at forecasting, and can we learn from them? That is the topic of Tetlock's bestselling 2015 book Superforecasting: The Art and Science of Prediction, which reports in particular on an IARPA sponsored study of an earlier prediction tournament, though where participants were assigned to teams and encouraged to discuss with teammates. Their conclusions relate success to both cognitive style of individuals and to team dynamics.

My own extended account of this topic with a little math is a few sections in my ongoing lecture notes.