Use and abuse of toy models

The phrase toy model has several interpretations. Wikipedia describes it as "a deliberately simplistic model with many details removed so that it can be used to explain a mechanism concisely". I would say
a simplistic model which we don't pretend will give numerically accurate predictions but instead gives insight whether some mechanism might possibly explain some qualitative observation.

A good example

An example of a useful toy model is the Wright-Fisher model in population genetics. To quote the Wikipedia page
Think of a gene with two alleles, \(A\) or \(B\). In diploid populations consisting of \(N\) individuals there are \(2N\) copies of each gene. An individual can have two copies of the same allele or two different alleles. We can call the frequency of one allele \(p\) and the frequency of the other \(q\). The Wright-Fisher model assumes that generations do not overlap and that each copy of the gene found in the new generation is drawn independently at random from all copies of the gene in the old generation.
At first sight this model -- offspring having random parents -- looks silly, but in fact is mathematically similar to more sensible "parents have offspring" models. The main unrealistic feature is to neglect geography, that is the fact that parents will not be random picks from a widely spread population but will be physically close.

Setting aside that issue, the key conceptual point is that the model refutes an objection to "evolution by natural selection" that Darwin (unaware of basic genetic theory) couldn't answer. Observation of animal breeding might suggest that offspring characteristics are a mixture of their parents', like a mixture of blue and yellow paint makes green paint. Of course this couldn't be the whole story, or eventually every individual in a population would be almost identical by heredity, but (unaware of genetics) one might imagine heredity working as "mixture of parents, plus individual randomness". And indeed this kind of "additive" model does roughly predict the behavior of some real-world quantitative characteristics, for instance height in humans. But the general "heredity is like paint mixing" analogy faced a fundamental objection in Darwin's time; a new favorable characteristic arising once by mutation would be diluted in subsequent generations and the advantage would effectively disappear. In a simple mathematical model, for a population of size \(N\) the characteristic would need to re-arise very many (order \(N\)) times before becoming established in the population.

But in the Wright-Fisher model (modified to allow mutations), if a new allele has a small selective advantage \(a\), that is if it leads to an average of \(2(1 + a)\) instead of 2 offspring, then there is a chance of order \(a\) that from a single mutation the new allele will spread throughout the entire population regardless of how large the population. So a new mutation would only need to re-arise a comparatively small number of times.

OK, the model and my story above are absurdly over-simplified, but do serve to show that "evolution by natural selection" is mathematically possible, at least if there are discrete entities underlying heredity. And these are the "factors" (now called genes) that Mendel has postulated earlier.

Misuses of toy models

1. The main misuse is via an implicit appeal to dubious logic:
we know mechanism A would cause effect B; we observe effect B; we infer that A is indeed the mechanism causing the effect.
The point is that there might be many alternative possible mechanisms that would also cause the observed effect. If the effect is highly specific and quantitative then the argument is plausible, but if it is just a qualitative effect then the argument is unconvincing.

A particular field where this misuse is rife involves power laws for observed data. Here I am not envisaging purely physical phenomena but rather data from the human world. There are many quite different models which lead to power laws (see this 2004 paper by Michael Mitzenmacher for a few such models). Hundreds of papers have been written since 2000 by authors inventing a model and claiming their model "explains" this or that occurence of power law data; all such claims should be met with extreme skepticism.

As a caricature of this issue, consider the 7-parameter superformula which can reproduce an impressive range of shapes by choice of parameters. Then consider No Man's Sky in which any of 2^64 quite realistic-looking planets and ecosystems can be created deterministically from a general 64-bit sequence. The fact that you can reproduce data using a mathematical model does not imply that the model indicates the actual mechanism.

A more pertinent example is provided by a web page Minority Rules: Scientists Discover Tipping Point for the Spread of Ideas which reports on a published paper. It starts with an impressive-looking graphic and the sentence

Scientists at Rensselaer Polytechnic Institute have found that when just 10 percent of the population holds an unshakable belief, their belief will always be adopted by the majority of the society.
But careful scrutiny, or a look at the actual paper, shows this is a "toy model" in which everything has been made up; there is absolutely no actual data from the real world. And the notion that (except in very exceptional contexts such as this) you can get numerical values like 10 percent out of such data-free toy models is absurd. All that one can conclude from this study is a conceptual point, that simplistic models like this for phenomena like "spread of opinions" often have tipping points. And this has been well known to mathematicians for a long time. As math theory this style of work -- fleshing out details of conditions under which certain behavior holds -- is fine, but claims about direct real-world relevance are just "bad science".

2. A second issue can also be illustrated by variants of the Wright-Fisher model. As well as incorporating mutation and selection effects, one wants models which incorporate "spatial structure" of the population, and one can readily invent simple such models which represent a next level of "toy model"; then one could also add migration, and so on. This had led to many hundreds of papers, and monographs such as this. This is how academic theory proceeds, and as with the basic Wright-Fisher model it is interesting to learn what the possible real-world behaviors are by studying the behaviors of mathematical models. But there is the danger of implicitly coming to believe a kind of Platonic fallacy; that one can get closer to Reality by Pure Thought. This issue, in applied mathematics, parallels a sentence from a famous 1947 article by von Neumann, who was thinking of theorem-proof mathematics.

As a mathematical discipline travels far from its empirical source, or still more, if it is a second and third generation only indirectly inspired by ideas coming from "reality" … there is a grave danger that the subject will develop along the line of least resistance, that the stream, so far from its source, will separate into a multitude of insignificant branches, and that the discipline will become a disorganized mass of details and complexities.
In the (purportedly real-world) context of modeling of social networks as random graphs, this effect was satirized in Cosma Shalizi's famous Three-Toed Sloth blog (April 19, 2016) as
a field in which the realities of social life are first caricatured into an impoverished formalism of dots and lines, devoid even of visual interest and incapable of distinguishing the real process of making movies from a mere sketch of the nervous system of a worm, and then further and further abstracted into more and more recondite stochastic models.
Instead of moving outwards from known results, I prefer an opposite approach to research. Start by considering problems that are way outside what is currently understood, and then try to work inward by slowly simplifying -- call this the the external DLA approach.

An example of my own

Here is an example of a setting where one can devise and study a toy model, from this techical paper of mine.
Imagine you are the 100th person in line at an airport security checkpoint. As people reach the front of the line they are being processed steadily, at rate 1 per unit time. But you move less frequently, and when you do move, you typically move several units of distance, where 1 unit distance is the average distance between successive people standing in the line. This phenomenon is easy to understand qualitatively. When a person leaves the checkpoint, the next person moves up to the checkpoint, the next person moves up and stops behind the now-first person, and so on, but this ``wave" of motion often does not extend through the entire long line; instead, some person will move only a short distance, and the person behind will decide not to move at all. Intuitively, when you are around the \(k\)'th position in line, there must be some number \(a(k)\) representing both the average time between your moves and the average distance you move when you do move -- these are equal because you are moving forwards at average speed \(1\). In other words, the number \(W\) of people who move at a typical step has distribution \(\Pr(W \ge k) = 1/a(k)\). This immediately suggests the question of how fast \(a(k)\) grows with \(k\). In this paper we will present a stochastic model and prove that, in the model, \(a(k)\) grows as order \(k^{1/2}\). This is roughly consistent with our own (very limited) data.

Conclusion

It's good to think of the simplest possible model of any interesting real-world phenomenon and study its math properties. Even if this simplest model is known to be completely wrong, it's worth brief mention as contrast to the right model (cf. "paint mixing" above, or phlogiston). Of course toy models are mostly useful when truly realistic models are not known or are very complicated to analyze, and where one is seeking the modest goal of showing that some observed qualitative behavior is possibly explicable by some specified mechanism. But the temptation to over-claim that this constitutes as actual real-world explanation must be resisted.