Reflections on the relationship between mathematical probability and the real world.

Mathematics and models

A typical scientist might say the relationship between mathematics and science is clear. There are three steps
Real-world setting --->>> formulation of mathematical model [science knowledge or hypothesis]
formulation of mathematical model ---->>> predictions from model [mathematics: theory or simulation]
predictions from model ---->>> compare with experimental/observational data [statistics, if needed]
and mathematics concerns the "theory" part of the second step. A typical theorem-proof mathematician might agree that their job is only to argue from assumptions to conclusions, and not be concerned with anything outside mathematics:
Mathematics can be seen as a big warehouse full of shelves. Mathematicians put things on the shelves and guarantee that they are true. They also explain how to use them and how to reconstruct them. Other sciences come and help themselves from the shelves; mathematicians are not concerned with what they do or with what they have taken. Jean-Pierre Serre.
(I saw this quote in a thought-provoking 2018 U.K. report The Era of Mathematics, which decried such a "passive" attitude). My experience is that when mathematicians say "science" they usually think "physics", as having precise mathematical laws.

Turning to probability models, even though the only mention of probability in Hilbert's problems was in the context of statistical mechanics, most uses of probability models are in contexts outside physics where we do not expect the model to be precisely accurate. In mathematical probability, even the part called applied probability, a model is generally a set of stated rules for how some hypothetical system might vary in some random way, and one seeks to study the resulting behavior of that system. While parts of theory often have names (diffusion processes or birth-and-death processes) reflecting their origin, amongst the 3,500 papers a year on such theory in MathSciNet very few have any relation with actual data. So this activity matches the Serre quote above, and perhaps also the following well-known quote.

As a mathematical discipline travels far from its empirical source, or still more, if it is a second and third generation only indirectly inspired by ideas coming from "reality" ..... there is a grave danger that the subject will develop along the line of least resistance, that the stream, so far from its source, will separate into a multitude of insignificant branches, and that the discipline will become a disorganized mass of details and complexities. von Neumann, in The Mathematician.
The famous statistics quote is
All models are wrong, but some are useful. Box and Draper.
(Though memorable, this quote strikes me as silly, in that in implicitly treats wrong and useful each as a yes/no alternative, whereas obviously there is a spectrum of accuracy and a spectrum of usefulness.) While a statistician will typically think of models as models for some particular data, mathematicians think in more abstract terms, as noted above. I prefer the phrasing
probability models are fiction
because fiction has a spectrum from pure fantasy to literary realism, and analogously probability models lie on a spectrum from fantasy to toy models, which we don't pretend will give numerically accurate predictions, to models with verifiable numerical predictions. The latter I will call fact. As discussed here, I once rather naively assumed that there were many interesting examples of such fact that I could explain to undergraduate students, but the total number remains embarrassingly small.

Anchored or unanchored?

Here is another take on modelling. Defining a model and studying its behavior via theorem-proof mathematics is one way of anchoring your work, to the whole body of rigorous mathematics. Defining a model and comparing its behavior with real world data is another form of anchoring, to science. But there are academic fields where a substantial proportion of papers are neither. This activity is politely called methodology, though within my own fields of interest (e.g. quantitative aspects of networks) I impolitely call it low-level statistical physics.

Also, the culture of academic research in quantitative disciplines often encourages theoretical modeling which is never seriously compared with data. This is emphasized both by Taleb in The Black Swan and Piketty in Capital in the Twenty-First Century. Individual papers of this unanchored style can be interesting, but when a whole topic is unanchored it is unlikely to be of lasting value.


For readers who have a mathematics or statistics background, a major part of this project is to emphasize that we perceive chance in many contexts where we do not use models -- look through our extended overview page. So let me rather lazily write perception for what we are doing in all these other contexts.