The top ten things that math probability says about the real world

by David Aldous.

Talk given at Cornell University, April 14 2008. Like most scientists, I habitually just write slides and extemporize the actual spoken words. For this one talk I wrote the following script, though (being hard to break the habits of 30 years) I didn't stick closely to the script. Each link is one slide.

click here to read Belorussian translation.

Every academic discipline has its own peculiarities, and let me start by pointing out a peculiarity that my own topic (math probability) inherits from its parent mathematics. Just about every University offers an undergraduate course in math probability, and as a student in such a course what you actually do is about 100 homework problem of this kind. The top ones are ``just math" without any attempt at a real-world explanation, and the bottom two are weird made-up stories (poisonous mushrooms and car mufflers) referring to real-world items but not attempting to be empirically realistic. If you come from say Medieval History, you might think I've chosen some extreme example, but (alas!) different textbooks differ only in the the relative proportions of these two genres.

Why is this peculiar? Let's take some other discipline, say Evolutionary Biology (which I have an amateur interest in). There's a similarity between math probability and Evolutionary Biology; in relation to their parent disciplines, there's a substantial body of ``popular science" literature. On my homepage I maintain a list on non-technical books relating to probability -- 50 or so books in different categories. One could make a similar list of popular books on Evolutionary Biology or 19'th century American History. For those subjects, I'm pretty sure there's a considerable overlap between what's done in popular books and what's done in an undergraduate course -- in both contexts, surely one should start by deciding what's important and interesting, and then talk about it. But in math probability there's an almost complete disconnect between what we teach undergraduates and what writers of popular books think is important and interesting. What we teach is technique -- and you learn technique in math just like you learn it in tennis or oil painting, by spending time doing it yourself. Of course I've nothing against learning technique, but it's peculiar that we just teach technique. To give an analogy, reconstructing evolutionary trees from fossils often reduces to minute study of fossil teeth and a professional paleontologist needs to learn this, but it would be odd to emphasize it in a first undergraduate course.

So here is my manifesto. What's interested me for a while is: in at least some undergrad math probability courses, can we try to focus less on technique and more on what theory actually says, in the sense of predictions for the real world? I've tried to do this both in freshman seminar courses and in a junior-senior seminar course. And you then realize it's hard to do -- it's just hard to decide what's actually important. So I finally forced myself to compose a top 10 list -- and here it is.

At least, here are 6 of them. These are the ones I'm not going to talk about, except briefly now.

Opinion polls, like airplanes, actually work rather well -- they only get into the news when things go wrong. I'm using opinion polls as a proxy for other textbook concepts like randomized controlled experiments -- we really do know how to determine whether medical drugs work, but again the subject only gets into the news when things go wrong.

Separating skill from luck in the aggregate is one conceptual aspect of the regression effect. The 5 baseball teams with best records in 2007 were Boston, NY Yankees, LA, Cleveland and Arizona. Without knowing anything about baseball, one could confidently bet that at least 3 of these 5 will have worse records in 2008 than in 2007. As a predictable statistical effect -- part of their success was chance.

The Ludic Fallacy is a cute neologism (by Nassim Nicholas Taleb, about whom more later) for thinking that most real-world uncertainty is analogous to simple games of chance -- mostly it's not.

Injecting randomness refers to game theory; in a football game, the offense doesn't want opponents to know whether they're doing a passing or running play, so they implicitly put some randomness into the choice.

The fact that most letter strings JQTOMXDW KKYSC have no meaning is what makes most simple letter substitution ciphers easy to break. In a hypothetical language in which every "monkeys on typewriters" string had a meaning, a letter substitution cipher would be impossible to break, because each of the 26 x 25 x 24 x .... x 2 possible decodings might be the true message. Now if you want to transmit or store information efficiently, you want every string to be possible as a coded string of some message (otherwise you're wasting an opportunity) and indeed you ideally want every string to be equally likely as the coded string of some message. This is "coding for efficiency", but with such an ideal public code one could just apply a private letter substitution cipher and get an unbreakable "code for secrecy".

The final item is more frivolous. Genetics is one of the classic application of probability but rather than attempting to say anything portentious about Evolution, here's a down home calculation. You have 2 parents, 4 grandparents, 8 great-grandparents, and back 10 generations you have somewhat less that 1,024 ancestors (not all different). How many have you actually inherited DNA from? Under a common simplified model, about 370. Less than half. So even if you can prove that one of your ancestors was a relative of King George III, this doesn't mean you have royal blood.

Here are the 4 topics I will talk about today. Note that only 2 or 3 are these 10 ideas are treated in a typical undergrad course. Let's jump into the first topic:

Coincidences are more likely than you think.

Here's a textbook item. The famous birthday paradox says: with 23 people in a room, there's about a 50% chance that some two people have the same birthday. There's nothing magic about 23. For any number of people there's a formula for the chance of some shared birthday, it just turns out that to make this chance be 50% you need 23.

That's a math prediction; and you can check how well it works with e.g. baseball teams. I get kids to do this in a freshman seminar, but of course you could get elementary school kids to do it just as well. Baseball team rosters are easy to find online, and happen to have players birthdays, and happen to have about 25 players on an active roster. This is today's Toronto Blue Jays roster, and indeed we see two players -- Jeremy Accardo at the top and Vernon Wells second from bottom -- with the same birthday, December 8th.

This is a setting where you can check that math theory works -- the prediction is that slightly more than half the baseball teams have two players with the same birthday, and every time I've checked it, it works. But it's a long way away from the kind of real-life coincidences that people find striking. As illustrated by this sign one sometimes sees on church signboards, there's a long tradition outside mainstream science of assigning spiritual or paranormal significance to coincidences, by relating stories and implicitly or explicitly asserting that the observed coincidences are immensely too unlikely to be explicable as ``just chance".

Rationalists dispute this, firstly by pointing out that (as illustrated by the birthday paradox) untrained intuition about probabilities of coincidences is unreliable, and secondly by asserting that observing events with a priori chances of one in a gazillion is not surprising because there are a gazillion possible other such events which might have occurred but didn't. In other words, it's not surprising when {\bf somebody} wins the lottery, because there are millions of people who didn't.

I'm on the rationalist side, but must admit that they've never provided actually evidence that interesting real-life coincidence occur no more often than "pure chance" predicts. That is, rationalists are stuck in the playground "oh no they're not" mode rather than offering empirical evidence. Their difficulty is that the sort of things that mathematicians can make predictions for, are what I call "small universe" models (like the birthday paradox) where we specify in advance what coincidences might happen.

If we want to take one step toward real-life coincidences, I think we should pay attention to three features of real-life coincidences:

(i) coincidences are judged subjectively -- different people will make different judgements;
(ii) if there really are gazillions of possible coincidences, then we're not going to be able to specify them all in advance -- we just recognize them as they happen;
(iii) what constitutes a coincidence between two events depends very much on the concrete nature of the events. The birthday paradox has nothing mathematically to do with birthdays -- if there were 365 equally common hair colors the answer would be the same.

Here's my attempt to study a setting with these three features. Alas this will mark me as a complete nerd. Let me seek some reassurance that I'm not the only complete nerd here, by asking for a show of hands on the following question. Be honest now

Have you ever spent more than 2 minutes clicking on the Wikipedia "random article" link?

Click long enough, you see an article that is similar in some coincidental way to a previous article in the same session. So I did an experiment; on 28 different days I clicked and read until noticing some unusual similarity between the current article and some previous one that day; then I wrote down both the article titles and a brief description of the "specific coincidence" observed. Here are the results.

Of course Wikipedia isn't exactly "real life" but it does have the 3 essential features mentioned before, and also what I'm doing is a "repeatable experiment" -- we can actually do science here. In particular, after seeing a specific coincidence one can go back and interrogate Wikipedia to calculate the a priori chance of seeing that coincidence in two random pages. For instance if you click "random article" twice, the chance both pages are Hindu religious figures is about 12 in 100 million.

This data helps us focus on the ``gazillions" argument. We do see unlikely coincidences -- figures are ``per 100 million".

If you repeated the experiment then almost all your 28 coincidences would be different from my 28. On the other hand with a lot of work (in progress with undergraduates) one can go up from specific coincidences to ``coincidence types" -- "people with similar occupations" -- in the table we see (aside from sports people and historical people) country music singers, children's lit authors, College Presidents, filmmakers. We can learn from examples and think exhaustively of other occupations, to estimate the chances of unseen coincidences of this general type. We hope to identify maybe 1/2 or 2/3 of "coincidence space", and in this way check the rationalist assertion. Of course Wikipedia is another "small universe" which won't impress spiritualists, but it's a step in an interesting direction.

100% Kelly strategy marks the boundary between aggressive and insane investing.

In an old joke, two men are running, pursued by a grizzly bear. One turns to the other and says "why are we running -- we can't outrun the bear!". The other replies: "I don't need to outrun the bear, I just need to outrun you!".

This thought applies to gambling (on horse races or football games, say) and to stock market type investments; you don't need a crystal ball that says what's going to happen, you just need to know the probabilities, and indeed you don't need to know probabilities exactly, you just need to know them better than other people.

Talking about the general topic of finance, risks, investments carefully requires some prior discussion of issues like utility functions and risk aversion. The issue is: compare in your mind the prospects of gaining $250 or losing $250. If your net worth is say $500,000 the $250 is so small that you should regard these as equal and opposite. That is, it's irrational to be risk-averse with small amounts. On the other hand with $250,000 it's perfectly rational to be more worried about the risk of losing half your wealth. In fact people are predictably irrational in such matters, which is a different item on our top 10 list -- for instance the majority of affluent Americans have irrationally small insurance deductibles -- they're paying for something they don't need -- but here I'll stick to rationality.

I'll talk about long-term investment-- imagine you fortunately inherit a sum of money at age 25 and you resolve to invest it and not start spending it until age 65. Here compounding plays a substantial role; if you could get a fixed real interest rate of 7% and ignore taxes, then the "rule of 70" says you money will double in 10 years and so double 4 times -- that is grow to 16 times the original -- over a 40 year period.

What I'll talk about -- the Kelly criterion -- envisages a particular setting.

(i) You always have a choice between a safe investment (pays interest, no risk) and a variety of risky investments (suppose we know the probabilities of the outcomes of these investments).
(Of course in practice we don't know probabilities, so need to use our best guess instead.)

(ii) Fixed time period -- imagine a year, could be month or a day -- at end you take you gains/losses and start again with whatever you've got at that time ("rebalancing").

The Kelly criterion gives you an explicit rule for how to divide your investments to maximize long-term growth rate. We can say the formula in an undergraduate class, but the conceptual point is rarely emphasized. To illustrate, short-term fluctuations of stock prices are "purely random" to first order, but quantitatively inclined speculators pore over data seeking non-randomness to exploit; suppose you've found such scheme, and we'll invent some numbers.

Here's the model. Each day there's a bet:

51% chance to double money
49% chance to lose all money.

Looks good -- expected gain 2% per day -- but you don't want to risk all your money on one day. Instead use strategy: bet fixed proportion p of your capital each day. Theory says: the long-term growth rate depends on p, but in an unexpected way.

Optimal strategy is to bet p = 2% of your capital each day; this provides growth rate 2/10,000 per day, which (250 trading days per year) becomes 5% per year.

If you're timid and bet a smaller amount, then your growth rate will be smaller. More surprisingly, if you're too aggressive, and bet a larger proportion, then your growth rate goes down too. If you consistently bet more than 4% each day you'ld lose money in the long run. Greed isn't good here. Part of the explanation is asymmetry of investing all your money: a gain of 50% followed by a loss 50% works out as a 25% loss.

This 5% looks small -- you could get that in a bank -- but we're only betting a small proportion of our money so we can keep the rest in the bank, so we're getting "5% above the risk free interest rate".

The math above depends on hypothetical assumptions -- but completely generally the Kelly criterion says how to divide your portfolio between different investments. The numbers for growth rates that come out of the formula of course depend on the numbers you put in, but there's one aspect which is "universal". In any situation where there are sensible risky investments, following the Kelly strategy means that you accept a short-term risk which is always of the same format: 50% chance that at some time your wealth will drop to only 50% of what it you started with. Here the percents match up: 10% chance that at some time your wealth will drop to only 10% of what it you started with. It's OK to be uncomfortable with this level of risk and to be less aggressive -- get less long-term growth but with less short-term volatility. This is a matter of personal taste, like $3 latte vs $1 cup of joe.

The universality above is remarkable, but there's something more subtle here that in a sense is even more remarkable. One can make a rather loose analogy with the light speed barrier. Common sense says objects can be stationary or more slowly or most fast or move very fast, and that there should be no theoretical limit to speed -- but physics says in fact you can't go faster than the speed of light. And that's a very non-obvious fact. Similarly, we know there are risk-free investments with low return; by taking a little risk (risk here equals short-term fluctuations) we can get higher long-term reward. Common sense says this risk-reward trade-off spectrum continues forever. But in fact it doesn't. As a math fact, you can't get a higher long-term growth rate than you get from the "100% Kelly strategy". You've free to take more risk if you like excitement but you don't benefit from it.

There's a rather nice book on this topic -- Fortune's Formula by William Poundstone -- and this is my review on my web page. He uses the slogan I've borrowed -- 100% Kelly strategy marks the boundary between aggressive and insane investing -- and it's a pretty accurate slogan.

In everyday life actions under uncertainty, people are predictably irrational.

There's a lot of psychology research (Amos Tversky et al) giving experiments on "decisions under uncertainty". A good reference is R.S. Nickerson Cognition and Chance: the psychology of probabilistic reasoning.

Here's the most famous example: decisions can be strongly affected by how information is presented. Imagine a rare disease is breaking out in some community. if nothing is done, 600 people will die. There are two possible programs. For half our subjects, describe the alternatives as
(A) will save 200 people.
(B) will save everyone with chance 1/3 and save no-one with chance 2/3

For the other half of our subjects, describe the alternatives as
(C) 400 people will die
(D) no-one will die with chance 1/3; 600 people will die with chance 2/3.

Here's what happens.
Given the "A or B" choice, most people choose A.
Given the "C or D" choice, most people choose D.
Of course these are logically the same: A = C and B = D but because they're presented differently, people react differently. If I say "your action will for sure save 200 people", you want to do that. If I say "400 people will die for sure because of your action", then you don't want to be responsible for that.

In my undergraduate course, students do course projects, and one option is to repeat some classic experiment. Here's a fun example.

Subjects: college educated, non-quantitative majors.
Equipment: bingo balls (1 -- 75) and 10 Monopoly $500 bills.
Draw balls one at a time; subject has to bet $500 on whether next ball will be higher or lower than last ball; prompt subject to talk (recorded) about thought process. Repeat for 5 bets.
Say: we're doing this one last time; this time you have option to bet all your money. Prompt talk.

It's not clear to the subjects -- or even to you or me when we read this description -- what is the point of this experiment. The point is that we're interested, not so much in people's actions, as in their reasons for their actions.
In first part, everyone behaves and explains rationally: if this ball is 43 then more likely that next ball is less than 43, so bet that way.
In the last part, what explanations do people give for their choice of whether or not to bet all their money? In our experiments, about 50-50 split between

risk-aversion; good or poor chances to win
feeling (or have been) lucky or unlucky.

Conclusion: even when primed to think rationally, people have innate tendency to revert to "luck" explanations.

My (non-expert) bottom line? We have a ``medium-scale" understanding of how people think about uncertainty -- rational or irrational -- at least in a laboratory setting. That is, in maybe 15 settings like the two mentioned, we can abstract the structure of the situation and predict that in similar situations, people will behave in the similar rational or irrational ways.

But no consistency across all 15 settings. For any setting when we can predict people will be irrational in one direction, there's another setting where we can predict they'll be irrational in the other direction.

Bottom line: humans not basically rational; different parts of our mental structure come into play with these different settings.

In predicting the future, remember inertia, trends, statistical fluctuation, unpredictable unique events

My final topic is more philosophical. Philosophical-minded people have long debated to what extent human history is predictable or unpredictable.

Had Cleopatra's nose been shorter, the face of the world would have changed. (Pascal)

For quantitative people, arguing about ancient history isn't so appealing; a more precise question is "how well can you predict the future" and a closely related question is "how well did past predictions work out?" Of course it's important to decide on a time scale -- predicting 2 days ahead is very different from prediction 200 years ahead -- so let's take 25 years.

Predict the state of the world in 2033
(or: how good were 1983 predictions for 2008?)

Of course we don't expect to be very accurate ........

I'm not going to reveal any new crystal ball for making predictions, but I am going to share my intellectual tools for thinking about other people's predictions. My point is that people making predictions tend to focus on just some of these, according to their own mind-set. And that 2 of these 4 tools are somewhat related to probability.

Inertia: In much everyday life we just assume aspects of the world will be unchanged in 25 years. Imagine a student deciding to be a doctor or a lawyer, or to learn French; or someone building an office building or planting a tree. In all these matters you're assuming that in 25 years there will still be doctors and lawyers and French-speakers and that people will work in office buildings, etc. Now I'm not disputing these examples. However, to relate a personal example, let me tell you the only bad advice I ever got from my father.

Son, don't bother to learn typing, soon enough you'll have a secretary to do that!

Ha-Ha.

Point: when you think about how the future might be different, you unconsciously assume that aspects you're not thinking about will stay the same.

Trends. Easy to look back over 25 years and identify trends -- what trends you pick out depend on your own interests.

Moore's law
China's economic growth
continued slow increase in life expectancy

You can speculate on each trend: accelerate or continue or slow or stop or reverse?
You can speculate about new trends. Consider nanotech or virtual worlds. Today the number of people employed in nanotech is negligible compared to the number employed as janitors, and the time spent in Second Life is negligible compared to the time spent watching TV soap operas. This might change in 25 years.

How good were past predictions? Two examples:
1982 best-seller Megatrends (John Naisbitt) does pretty good job
1972 "Limits to Growth" was embarassingly wrong.

It's surprisingly hard to statistically estimate how good past predictions were, because of selection bias and survivorship bias: the predictions that are remembered or are easy to find tend to be those that were spectactularly right or wrong.

I don't have anything to say here other than the obvious: it's silly to ignore trends, and equally silly to blindly extrapolate trends.

Statistical fluctuations. Imagine you wake after 25 years sleep, and decide to look at the 9,000 daily headlines you missed.

I say: about as useful as looking at results of dice throws. Because you'll see

earthquakes, hurricances, plane crashes
small wars
political sex scandals
partisan politics as a game
Dow drops 3%

Point: News presents itself as "change" but mostly isn't. These things are interesting to follow in real time, in the same sense that's it's interesting to follow a football game in real time, but common sense says that the statistical distribution of most of these types of headlines over the next 25 years will be similar to those over the last 25 years. In fact it would be significant if any of these types of headlines did disappear.

Impact of unique unpredictable events. My final topic -- back to Cleopatra's nose -- the impact of unique unpredictable events -- is prompted by the recent bestseller The Black Swan: The impact of the highly improbable by Nassim Taleb. He writes that a black swan is an event of

.... rarity, extreme impact, and retrospective (though not prospective) predictability. A small number of Black Swans explain almost everything in our world .....

I've spent a long time thinking of the right phrase to describe this book, and my current humorous phrase is "like the da Vinci Code as written by Humpty Dumpty" That is, the author is widely read, has a fertile imagination and writes in a lively way, but the proposition as formulated above is rather absurd. Like Alice's Humpty Dumpty, the author uses words to mean whatever he wants at the moment, and like the nursery rhyme the book resembles fragments hard to assemble into a coherent whole.

In particular, there is no attempt to make a quantitative assessment of "unique events versus trends" or versus other alternatives. But it would be interesting to do so.

Project for a young person. Write down a long list (5,000?) of possible "Black Swans" for the next 10 years -- events generally reckoned to have chance less than 1%, say. Any science ficton fan like myself can rattle off 100 without pausing for thought! Wait 10 years. Then see: How many happened? how many others happened that you didn't think of? This would provide empirical evidence in place of idle speculation.

Having been critical of Mr Taleb's book, it is only fair to give him the the last word. I show this slide without comment, except to remark it may be the only time you ever see the words "mathematics" and "expensive suits" in such close juxtaposition.