Coherent Stochastic Models for Macroevolution
This is joint work with Lea Popovic and Maxim Krikun.
Brief motivation
There is a substantial literature on comparing data on different
aspects of macroevolution  the evolutionary history of
speciations and extinctions  with the predictions of simple
``pure chance" stochastic models.
Available data includes
 the distribution of number of species per genus
 shapes of phylogenetic trees on extant species
 fossil time series  fluctuations in number of taxa over time
The fit of simple models, and of more elaborate models
incorporating conjectured biological process,
have been studied in these contexts.
While datamotivated models are scientifically natural,
a mathematical aesthetic suggests a somewhat different approach:
start with a ``pure chance" model which encompasses simultaneously
all the kinds of data that one might hope to find.
Here are two instances of what one would like such a coherent model to provide.

Joint description of the phylogenetic tree on an extant clade of species,
its extension to the tree on an observed small proportion of
extinct species, and the (unobserved) entire tree on all
extinct species.
 Joint description of fossil time series at different levels of the taxonomic
hierarchy.
(We emphasize the latter because biological
literature tends to assume that a model can be applied at any level,
without enquiring whether this assumption is logically selfconsistent).
Outline of model
Our purpose is to present what is arguably the mathematically
fundamental such model.
The underlying model is simple  a critical branching process conditioned to
have $n$ lineages at the present time.
Though hardly new in concept, our focus on conditioning to have
$n$ lineages (for comparison with real clades on $n$ extant taxa)
makes our results somewhat new in detail.
To model higherorder taxa (genera, say) we start by assuming that each new species
has some chance to be sufficiently different that it should be
considered a new genus. The remaining details of defining genera, bearing
in mind one desires monophyletic genera, can be handled in several
different ways (see draft paper for details).
Part of the project is to examine whether these different
schemes for defining genera make a qualitative difference.
Overview of results
Our results are derived as asymptotics for large $n$, even though
we envisage using them for rather small values, say $n = 20$.

We draw attention to some basic scaling results
 the $n^2$ law: that in a clade of $n$ extant species
one expects order $n^2$ extinct species
 the $n$ law: that the time since clade origin or since last common
ancestor is order $n$ times the
mean species lifetime
 the
$1/r$ law: that with probability $1/r$ there was
some past time at which the number of species was at least $r$
times the present number
 the $1/n$ law: that the probability a given extinct
species is ancestor to some extant species is order $1/n$
 and the constant law: that the probability that
a given extant species is descendant of some other extant species
has nonzero limit as $n \to \infty$ .
 A ``local" description of the probability structure of large clades, which permits
easy calculations
 A ``loss of evolutionary history under random extinctions"
calculation within our model.
 Joint distribution of time back to origin of clade; last common ancestor;
number of species at that time
 The shape of phylogenetic trees on higher taxa becomes more unbalanced.
 We compare typical fluctuation rates of taxon counts at
different levels of the hierarchy
 Our model has more intrinsic variability than previous models, and
therefore provides a more conservative approach to infering
biological mechanism (rather than ``just chance") from evolutionary history.