Is this prediction roughly accurate? The table below shows data collected by Amy Huang and Irvin Liu as an Undergraduate Research Project in Spring 2009. I haphazardly picked a few numbers out of the 81 possible.
trillion dollars | 1.4 | 2.8 | 3.3 | 4.2 | 4.7 | 5.6 | 8.4 |
observed frequency | 26 | 29 | 19 | 13 | 10 | 4 | 5 |
What conclusion might we draw? In brief, the data (excluding the anomalous first figure) shows a very crude fit to the predicted 1/x frequency. There are overlapping conceptual and practical isssues in obtaining this data, so we're reluctant to undertake quantitative analysis or draw any more definite conclusion. Instead we invite the reader to consider better ways to formulate and execute the project!
Details of data collection.
We went through the items found in the Google search,
and counted items which satisfied the following criteria.
(1) The data must refer to some explicit time period -- often this is one year,
but quite often a period like "2004-2008" or "2011-2015".
(2) Either the data/estimate itself looks authoritative, or one can quickly
find an authoritative source of the same number.
For future estimates, we didn't attempt to use our own judgement to assess
the reasonableness of a forecasting methdology, we just checked it seemed
to have been done by some reputable source, as opposed to a wild guess by a blogger!
(3) We did not double-count items refering to the same underlying data/estimate.
(4) We didn't count endpoints of intervals, such as "between 1.1 and 1.4 trillion".
But we did count "more than 1.4 trillion" in contexts where this implicitly
meant "a little more than 1.4 trillion", that is "between 1.4 trillion and 1.5 trillion".
(5) The stopping rule was to look at 20 pages of Google search results.
Issues in data collection. The practical difficulty was that (1-4) involve some subjective judgement, and that even maintaining a list of the "different" underlying instances requires some effort when the list exceeds 20. The conceptual difficulty is in defining a suitable "stopping rule": using "20 pages" is biased in that there may be be multiple items refering to the same instance.
Rounding. Obviously a writer uses "1.4 trillion" as a rounded figure, to mean something like "between 1.35 trillion and 1.45 trillion". This is why we didn't search for "2.0 trillion", since writers might round "2.03 trillion" to either "2.0 trillion" or to "2 trillion" or to "two trillion".
Aside. Somewhat related is The Secret Lives of Numbers which says: Since 1997, we have collected at intervals a novel set of data on the popularity of numbers: by performing a massive automated Internet search on each of the integers from 0 to 1,000,000 and counting the number of pages which contained each, we have obtained a picture of the Internet community's numeric interests and inclinations.