Coherent Stochastic Models for Macroevolution
This is joint work with Lea Popovic and Maxim Krikun.
Brief motivation
There is a substantial literature on comparing data on different
aspects of macroevolution -- the evolutionary history of
speciations and extinctions -- with the predictions of simple
``pure chance" stochastic models.
Available data includes
- the distribution of number of species per genus
- shapes of phylogenetic trees on extant species
- fossil time series -- fluctuations in number of taxa over time
The fit of simple models, and of more elaborate models
incorporating conjectured biological process,
have been studied in these contexts.
While data-motivated models are scientifically natural,
a mathematical aesthetic suggests a somewhat different approach:
start with a ``pure chance" model which encompasses simultaneously
all the kinds of data that one might hope to find.
Here are two instances of what one would like such a coherent model to provide.
-
Joint description of the phylogenetic tree on an extant clade of species,
its extension to the tree on an observed small proportion of
extinct species, and the (unobserved) entire tree on all
extinct species.
- Joint description of fossil time series at different levels of the taxonomic
hierarchy.
(We emphasize the latter because biological
literature tends to assume that a model can be applied at any level,
without enquiring whether this assumption is logically self-consistent).
Outline of model
Our purpose is to present what is arguably the mathematically
fundamental such model.
The underlying model is simple -- a critical branching process conditioned to
have $n$ lineages at the present time.
Though hardly new in concept, our focus on conditioning to have
$n$ lineages (for comparison with real clades on $n$ extant taxa)
makes our results somewhat new in detail.
To model higher-order taxa (genera, say) we start by assuming that each new species
has some chance to be sufficiently different that it should be
considered a new genus. The remaining details of defining genera, bearing
in mind one desires monophyletic genera, can be handled in several
different ways (see draft paper for details).
Part of the project is to examine whether these different
schemes for defining genera make a qualitative difference.
Overview of results
Our results are derived as asymptotics for large $n$, even though
we envisage using them for rather small values, say $n = 20$.
-
We draw attention to some basic scaling results
- the $n^2$ law: that in a clade of $n$ extant species
one expects order $n^2$ extinct species
- the $n$ law: that the time since clade origin or since last common
ancestor is order $n$ times the
mean species lifetime
- the
$1/r$ law: that with probability $1/r$ there was
some past time at which the number of species was at least $r$
times the present number
- the $1/n$ law: that the probability a given extinct
species is ancestor to some extant species is order $1/n$
- and the constant law: that the probability that
a given extant species is descendant of some other extant species
has non-zero limit as $n \to \infty$ .
- A ``local" description of the probability structure of large clades, which permits
easy calculations
- A ``loss of evolutionary history under random extinctions"
calculation within our model.
- Joint distribution of time back to origin of clade; last common ancestor;
number of species at that time
- The shape of phylogenetic trees on higher taxa becomes more unbalanced.
- We compare typical fluctuation rates of taxon counts at
different levels of the hierarchy
- Our model has more intrinsic variability than previous models, and
therefore provides a more conservative approach to infering
biological mechanism (rather than ``just chance") from evolutionary history.