Information mediated by human choice is hard to assess

In textbooks, to calculate a probability requires one to make a model in which you prespecify all the relevant events that might happen and then implicitly assign a probability to every combination of happen/not happen. I think of these as "small universe" models. The limited applicability of textbook probability to everyday life is partly due to the fact that you really can't do this except in very limited contexts -- consider all the coincidences that might happen, for instance.

The concept of information has several different-but-related meanings in different fields of the mathematical sciences. One overview can be found in Information: A Very Short Introduction. In mathematical probability, information is formalized as events whose outcome (happen/not happen) is now known. At first sight one might think "a fact is a fact is a fact, and there's no more to say". But in practice the issue can be much more subtle, and here I discuss one particular aspect of this issue.

Today (June 27 2017) the BBC web site shows 6 articles with headlines such as "Google hit with record $2.7bn EU fine" and "US warns Syria on chemical attack plan". What is the information conveyed by reading these 6 headines? It is not just these 6 facts, but also an almost infinite amount of implicit information about what did not happen. That is, we learn that all the possible events that would have been judged more important than these 6 did in fact not happen.

Here is another example discussed in more detail here.

The textbook answer would be the ratio
(number of families with twin boys, no other children) / (number of families with two boys, no other children)
which from empirical data is around 2%. This answer is wrong. The information we have is not "2 boys, no other children" but instead the information is the specific form of the response "yes, two boys". A person with twins might well have mentioned this fact in their response -- e.g. "yes, twin boys". And a person with non-twins might have answered in a way that implied non-twins, e.g. "Sam's in College and Jim's in High School". I can't imagine any empirical data that would give numerical probabilities for the vast range of possible responses, but my personal guess is that the former (indicating twins) is considerably more likely than the latter (indicating non-twins) and as an oddsmaker I would announce a 0.5% chance.

These two examples illustrate what I mean by information mediated by human choice. This setting, for instance when we encounter some information without prior thought, is in practice quite different from the "small universe" textbook setting of probability. In some contexts we immediately recognize the difference. In poker, we know that an opponent's hand after several rounds of betting is likely to be better than a newly-dealt hand, because they have chosen to remain in play. A more subtle example is family composition. Under the usual slightly-oversimplified model, the probability that a 4-child family has 2 girls and 2 boys equals 6/16. However, this assume that "4 children" is predetermined, and ignores the possibility that the gender of existing children might influence the choice of having more children. In a hypothetical society where families continued until there was at least 1 boy and 1 girl, then stopped, there would be zero families with 2 girls and 2 boys. So there is "human choice" effect in the distribution of family composition, though I do not know data of the actual size of this effect.