I REALIZED SOME time ago that I did not understand statistics
well enough to evaluate them. My solution was to stop repeating
statistics in my column. Surely we can agree that homelessness is a
problem that needs to be addressed, whether the number of homeless
in the United States is 300,000 or 3 million.
Numbers are often reported as facts; they look like facts, very
hard and specific. But they're not. They need to be cross-examined:
Where did the numbers come from? How was "homeless" defined --
people on the street for one night, one week, one month, what? Who
counted them, and when?
Even if I knew the answers to those questions, I would be no
closer to divining the truth. I have never taken a course in
statistics. I am, to use the jargon, "innumerate," which is to math
what "illiterate" is to reading. Knowing a little something about
statistics would help me find the small lies, the mistakes and
misinterpretations. (The big lies, where people just make up the
data, cannot be uncovered by just looking at the studies. That's why
we need whistle-blowers.)
So I asked around and finally settled upon two books, one old
("The Cartoon Guide to Statistics" by Larry Gonick and Woolcott
Smith) and one new ("Damned Lies and Statistics," by Joel Best).
I KNEW I was in the right place when I hit Page 46 of the cartoon
guide. The text presented a hypothetical problem. Suppose there is a
new disease infecting one person out of every thousand in the
population. Suppose there is a test for that disease.
If a person has the disease, the test comes back positive 99
percent of the time. Regrettably, the test also produces some false
positives -- 2 percent of uninfected people test positive for the
disease.
I just tested positive. Should I pay for the expensive treatment,
which involves bombarding my body with pearl onions accelerated at
high speeds?
Now me, I'd think there are 99 chances out of a hundred that I
have the disease. Wait, no, 2 percent false positives -- OK, 98
percent chance. I almost certainly have it. Bring on the pearl
onions.
Alas, the truth is that less than 5 percent of those who test
positive for the disease actually have it. There is math to back
that up, but the trick is to consider that 999 people do not have
the disease, and that 2 percent of them is a much greater number
than 99 percent of the .01 percent who do have the disease.
It's nonintuitive -- that's why it's called the False Positive
Paradox -- but it's true. It points out how careful we have to be in
handling numbers even when we know the numbers to be entirely
accurate.
SOMETIMES, THOUGH, WE don't even know that. We have to watch out
for other things: Sample size (the smaller the sample, the more
anomalous the results are likely to be), true randomness (phone
polls, for instance, discriminate against people who don't have
phones and against people who hate pollsters) and even the agenda of
the people promoting the statistics.
Note: Agenda and motive are not the same thing. Good people can
use bad numbers. For an instructive example of this, see the
discussion of the epidemic of church bombings in the South in
Chapter 2 of Best's book.
"Damned Lies and Statistics" is a sort of catalog of errors,
divided by type, with famous examples. In "The Worst Social
Statistic Ever," the introduction of the book, Best nominates this
sentence, found on numerous Web sites and journal articles: "Every
year since 1950, the number of American children gunned down has
doubled."
Assuming just one child gunned down in 1950, that would mean
that, by 1987, 137 billion American children would have died in the
gruesome manner, more than the estimated number of humans that have
ever lived on Earth.
Trying to be smarter, I selected packages of knowledge called books.
Stop all this weeping, swallow your pride; you will not die,
it's not jcarroll@sfchronicle.com.
|