![]() |
Frederick (a.k.a. Erick) A. Matsen
Miller institute fellow UC Berkeley Dept. Statistics 367 Evans Hall #429 Berkeley, CA 94720-3860 USA ph: +1 510 642 2450 fax: +1 510 642 7892 ![]() |
I develop mathematical techniques and computer algorithms to improve our understanding of evolution. My current research is motivated by two main questions: first, and how can we (better) reconstruct evolutionary history from present-day DNA sequences? Second, how do organisms diversify (e.g. speciate)?
How can we (better) reconstruct evolutionary history from present-day DNA sequences?
This is a very big and somewhat old question, with hundreds of scientists working on different aspects. The field which has developed is called phylogenetics. The general idea is that organisms with similar DNA sequences are usually more closely related than organisms which quite different DNA sequences. Making this formal and then running many computations on a computer leads to a tree diagram showing interelationships; this diagram is called a phylogenetic tree. My contributions to this big project are in two areas: phylogenetic mixtures and theoretical analysis of Bayesian methods.
Mixtures: It is well established from a theoretical perspective that if sequences evolve under a single (simple) model then a large amount of sequence data will reconstruct the tree correctly with high probability. However, it is now known that different parts of a sequence evolve in different ways; this is formulated statistically as a phylogenetic mixture model. In contrast to the single-process case, it is known that data from mixtures of processes does not uniquely determine a tree. Mike Steel and I recently realized that even more is true: it is possible to have a mixture of two processes on one tree such that the resulting data looks exactly like a single process on a different tree. I'm now interested in whether these sorts of issues really do pose a problem for phylogenetics researchers. Recent work with Mossel and Steel partially addresses this question through a combination of geometric and combinatorial means.
Bayesian: There are many different ways of building phylogenetic trees, and one class of such methods are called Bayesian methods. One advantage of these methods is that they can give posterior probabilities, which are (more or less) an estimate of how correct certain parts of the tree are. However, it can happen that even if there is no actual evidence determining how a certain set of species evolved, the methods can choose one scenario and attach a very high posterior probability to the story. This problem is called the “star tree paradox,” and Mike Steel and I recently showed analytically that it can persist even when the methods are given arbitrarily long DNA sequences.
How do organisms diversify?
This is an even older question, with a correspondingly bigger literature. My focus is on only one approach, which is based on looking at "shape" or overall structure of tree phylogenetic trees. A quick review of a couple virus trees show that different evolutionary scenarios can lead to different tree shapes.
In order to use tree shape in a scientific fashion, we need ways of quantifying it. So far I have written about tree shape in three ways: geometric, algebraic/combinatorial, and recursive. The recursive (optimization) approach has been the most productive for applications to data. I am currently applying this framework with Katherine St. John to search for evidence of tree reconstruction bias in modern tree reconstruction algorithms. I am also collaborating with Alexei Drummond applying these techniques to test for coalescent model mis-specification.
Other projects
In the past, John Wakeley and I investigated a class of models between the lattice model and the island model, and were able to show that these models converged back to the island model when the number of subpopulations goes to infinity. For this project I applied some nice theory about random walks on graphs.
I have also worked on the evolution of language with Martin Nowak. Rather than approach learning theory from the classical angle of an idealized teacher-learner pair, we investigated a model where the agents try to find a common language. We found some remarkably simple individual strategies which led to the population finding a common language with high probability given some constraints on the underlying space of languages.
This document was translated from LATEX by HEVEA.