By far the most popular stochastic model for reproduction in population genetics is the Wright-Fisher model (developed implicitly by Fisher in [4] and earlier papers and explicitly by Wright[14]). It is, of course, highly idealized in the form presented below. However, it can be (and has been) generalized and its assumptions relaxed, and even in its pure form, it succeeds in capturing the essence of the biology involved.

Wright-Fisher has many of the same assumptions as Hardy-Weinberg
Equilibrium, with the important exception of finite population size **N**
(after all, it is the effects of sampling gametes in a finite population
that we are interested in modeling). The really crucial assumptions are:

- finite and constant
**N**, - random mating with respect to the gene being studied, and
- non-overlapping generations.

First, note that because allelic variants are neutral, it makes no
difference to the fate of individuals (and thus the descent of their
alleles) how the alleles are distributed among individuals. We can
consider a haploid model then, as equivalent to a diploid model and in
general, no matter what the ploidy level, consider the``individuals" in
the model to be gametes, without regard to their arrangement in the
organisms themselves. Because most organisms of biological study are
diploid, we
will keep that convention, but the only effect in this case is that our
population size is **2N** gametes, rather than **N**. The remaining necessary
specifications to the model are as follows:

- 2 alleles, and
- number of alleles at time
**t**

In the Wright-Fisher model, we imagine that gametes are chosen randomly each generation from an effectively infinite gamete pool reflecting the parental allele frequencies. Then the sampling is binomial, and

Recall that one of the implications of Hardy-Weinberg was that under random mating and absent any directional perturbing forces such as mutation and selection, genetic systems will be at a stable equilibrium. Here, although we are allowing stochastic fluctuations in from generation-to-generation sampling, there is no directionality expected in the changes. This, plus the observation that is Markovian justifies the assertion that

and now we see that is also a martingale, with two
possible limits, **0** and **2N**. We can further write

and derive

It follows from the stopping time theorem for bounded martingales that the probability of being absorbed at either of the two boundaries is

We are interested mainly in the situation where has entered a monomorphic population (through, perhaps, mutation). This result tells us that when the new mutant enters the population (in a single copy, ), the probability that it eventually fixes and replaces the resident is its frequency, .

There are other ways to derive this result, one being to solve the Markov
chain directly. Another makes use of the ``coalescent"
reasoning described earlier by considering the genealogy of alleles in the
following way: at time 0, there will be **2N** gametes in the population,
any of which might or might not leave descendants in the next generation.
If they do not, the lineage of that allele copy is extinct in the
population. If we follow the population through time, eventually all but
one of the **2N** original lineages will be extinct, and the remaining one
will be fixed in the population. Because all of the original gametes have
equal probability of generating the surviving lineage, the fixation
probability of any allelic type is simply the frequency of that type.
Although this is simply a verbal argument, the genealogical perspective
underlying it is an extremely powerful one in analyzing molecular sequence
data, and it is thus worth thinking about some long-solved problems in
this way.

Tue May 12 11:50:21 PDT 1998