### Wright-Fisher diffusions with negative mutation rate!

The simplest case of the Wright-Fisher diffusion models the proportions (X_1,...,X_k) of genes (in an infinite-population limit) of each of k allelic types when the underlying mutation rates (type i to type j) are constant in i and j. It is a diffusion on the simplex

$\Delta = \{x = (x_1,..., x_k) : x_i \geq 0, \sum_i x_i = 1\}$

whose drift is toward the center point x_* of the simplex

• \mu(x) = x_* - x,
and with a certain variance rate \sigma^2(x). This diffusion has a stationary distribution in the interior of the simplex, and sample paths never hit the boundary of the simplex.

Mathematically it makes sense to consider the diffusion on the simplex with the same variance rate but with the sign of the drift changed.

• \mu(x) = - (x_* - x),
This process will hit the boundary, at which time we stop the process. Such a process has no obvious interpretation as a genetic model, but arises in the following setting.

Consider the space of all cladograms (phylogenetic trees) on n species. This paper and this paper study the Markov chain on this space defined by

• pick a leaf at random and re-attach to a random edge.
We want to consider the $n \to \infty$ limit as a diffusion process on some space of continuum trees, in the spirit of other models in this paper. Consider some initial cladogram, and consider some branchpoint in the cladogram. The branchpoint specifies a partition of leaves into three sets, of relative sizes (X_1,X_2,X_3). As the chain runs, these proportions change as a certain Markov chain, until the branchpoint disappears. With approporiate time-rescaling, the $n \to \infty$ limit is the diffusion process above.

More interestingly, take $b$ initial branchpoints. These partition the leaves into $2b+1$ sets, of which $b+2$ are "extrernal" to the spanning tree of the branchpoints, and $b-1$ are internal (associated with edges of the spanning tree). So there is a chain giving the proportionate sizes (X_1,...,X_{b+2}; Y_1,...,Y_{b-1}) of these leaf-sets, which we can run until some branchpoint disappears. Its $n \to \infty$ time-rescled limit is a multidimensional diffusion process generalizing the one above, now with drift $x_i - 1/3$ for the "external" sets and drift $y_j + 1/3$ for the "internal" sets, run until hitting some boundary face of the simplex.

It's now intuitively clear that these diffusion processes are recording certain aspects of an underlying "diffusion on continuum trees". Up to technicalities, the result of this paper implies the limit has positive spectal gap.

PROBLEM. Give a rigorous construction of this "diffusion on continuum trees".

PROBLEM. Can one do any explicit calculations with this process?

Update (February 2018). In a sequence of papers available under the title The Aldous diffusion on continuum trees, Forman, Pal, Rizzolo, and Winkel have given a detailed description of the multidimensional diffusion described above.

Update (June 2018). Using a rather different formalism, Lohr - Mytnik - Winter prove existence of a limit diffusion on continuum trees.

History. Problem described in a talk at the Fields Institute, March 1999.