## The actual two faces of Statistics: to explain or to predict?

Popular books often over-sell the acceptance of Bayesian methods as a revolution in Statistics,
a particularly egregious example being provided by the subtitle
*How Bayes' rule cracked the Enigma Code, hunted down Russian submarines,
and emerged triumphant from two centuries of controversy*.
Of course the advent of desktop computing and then of "Big Data" has changed the way
everyone does Statistics.
The idea that there exist alternate methodologies labeled "frequentist" and "Bayesian"
has an element of truth, but the idea these are **competing** methodologies is misleading.
I have asked working statisticians
do you know any example of data for which there are reasonable frequentist and Bayesian
analyses which give substantially different conclusions?

because this would make an interesting discussion topic in my undergraduate course.
But I have never found such an example.
The lesson is that explicitly-Bayesian methods are useful is some contexts, and other methods in
different contexts.
One view of a more substantial dichotomy was given by Leo Breiman in a paper
Statistical modeling: The two cultures:

One [culture] assumes that the data are generated
by a given stochastic data model. The other uses algorithmic models and
treats the data mechanism as unknown.

A helpful account of the latter culture is given by David Donoho in a paper
50 years of Data Science
in a section titled *The Predictive Culture’s Secret Sauce*.
He emphasized that the dramatic success of
Machine Learning has been facilitated by
structured competitions such as the
Netflix Challenge,
where one can judge which prediction methods work best on real data.
My own take on this dichotomy comes from a title of a Galit Shmueli paper:
To explain or to predict?
The early- and mid-20th-century development of mathematical statistics
focused on data from science experiments -- seeking to explain whether data was consistent with a model
-- as can be seen from the
data
sources for Fisher's 1923 Statistical Methods for Research Workers.
In the later-20th-century, statistical data about the human social and economic world became more prominent,
but here simple explanatory models reflecting "reality" rather than convenience are much less plausible.
What we now see every day, in Google's search results and Amazon's suggested purchases,
are instead just the output of algorithms **predicting** what we might like based on past data from similar customers.
(None of the 3 papers above is easy reading, but I encourage undergraduates interested in
the conceptual side of Statistics to look at them.)